Search Results: "keithp"

13 September 2013

Daniel Pocock: Calendar and Contact data with free software in the Smartphone era

Since the rise of smartphones, first Blackberry, then iPhone and now Android, the world at large has had a convenient option to fall back on for management of contact and calendar data. The data itself seems somewhat trivial: many of us software developers have worked with a plethora of obscenely more obscure schemas. It is unlikely that most people would ever have to perform an operation on their address book that requires a statistical toolkit like the R project or any of the other advanced tools we now have for data processing. In terms of data, addresses and phone numbers can appear quite boring. So why hasn't the free software community mastered it and cloud providers are currently running rampant hoovering up the data of billions of users? Ruling out LDAP One problem that people have faced is the existence of different standards. LDAP has quite a high profile and appears to be supported by default in many desktop applications. For this reason, it tends to grab people's attention but after a close look, many people find it is not the right solution. LDAP tends to work well for organisations where the data is centrally managed and in situations where fine-grained access control is not required. LDAP servers often need a capable UNIX administer to manage effectively. To top it off, LDAP itself does not provide an effective solution for two-way synchronization of data with remote and mobile users. The emergence of SyncML, CalDAV, GroupDAV and CardDAV Addressing this final problem (two-way mobile/portable device sync) has brought on the development of further standards. Initially, SyncML (now known as Open Mobile Alliance Data Synchronization and Device Management) grabbed significant attention, partly because it had been included in some of the first generation of smartphones from once-prominent vendors like Nokia. Open source server solutions for SyncML did not gain traction though. The most prominent is Funambol, however, it is not packaged in any major Linux distribution and it requires some basic Java knowledge to maintain it effectively. Like LDAP it appears to be most comfortable in enterprise use cases and doesn't appeal to the people who run a private mail and web server. SyncML has further suffered with the lack of support in more recent generations of smartphones. WebDAV was formalised in 1999 and there are now various derivative standards such as CalDAV and CardDAV which use WebDAV as a foundation. However, standards didn't arrive at this place directly: CalDAV only appeared as RFC 4791 in 2007 and CardDAV as RFC 6352 in 2011. In the meantime, some vendors experiemented with an alternative called GroupDAV. GroupDAV is largely obsolete now and GroupDAV v2 is a subset of CalDAV and CardDAV. Finally gaining traction The good news is, CardDAV and CalDAV are gaining traction, in no small part due to support from Apple and the highly proprietary iPhone. Some free software enthusiasts may find that surprising. It appears to be a strategic move from Apple, using open standards to compete with the dominance of Microsoft Exchange in large corporate networks. Both protocols are supported natively in iPhones and Mac OS X. More exciting, Apple releases their Contacts Server and Calendar Server as open source software, making it easier to look under the hood and observe any deviations from the standards. Apple's adoption of CardDAV and CalDAV has been accompanied by a similar effort in free software, however, support is more patchy. The Mozilla Lightning calendar plug-in for Thunderbird (known as the iceowl-extension plugin for icedove on Debian) has supported CalDAV for some time. In contrast, Thunderbird's address book lacks native support for CardDAV and has other limitations (discussed in detail below). Limited CardDAV support in Thunderbird can be achieved using the third-party SOGo Connector, which is not just for ScalableOGo server. The Evolution PIM appears to support both CardDAV and CalDAV relatively smoothly and appears to be more functional and easier to get started with. Free software servers are also emerging. DAViCal, despite the name, supports both CardDAV and CalDAV. It is a good solution for people who have an existing mail server and don't want to transition to an all-in-one groupware suite. Packages available in Debian make it easy to deploy and get started. It is a PHP solution, using a PostgreSQL database underneath. For people who do want a groupware suite, CardDAV and CalDAV are supported in products like ScalableOGo and Zimbra. For people who don't want a server at all and just want to sync between a desktop and a mobile device, there are options like Calypso. Mobile support: the missing link A number of people have blogged about using the Android apps CardDAV-Sync and CalDAV-Sync to complete the picture. I've tested these myself and verified that they work. However, there is one vital thing missing: the source code. Nobody knows what is inside these apps except the developer. That makes it impossible for other developers to troubleshoot them and improve them and it also makes it impossible to be certain that they do not leak information. An alternative is the aCal app for Android. Despite the name, aCal can synchronize the address book as well as the calendar. However, in my tests against DAViCal, I found that the addressbook can only be synced once and that on subsequent syncs, it fails to discover any new server-side contacts. aCal source code is available on gitorious.

aCal appears to be a compelling solution for Android users, hopefully the contact sync will be fixed soon. It only syncs once after installation and then fails to detect any new contacts. Thunderbird's limitations It has already been mentioned that the Thunderbird address book can be synchronized to a CardDAV server by adding the SOGo Connector plugin. However, this is not a perfect solution. The plugin itself is fine: the SOGo web site describes the connector (as opposed to the SOGo Integrator plugin) as a portable solution: in other words, it is vendor neutral, compatible with any standards-compliant server. The first limitation is that it is necessary to duplicate the setup process for both the calendar sync and the address book sync. This is just a question of convenience.

The SOGo Connector plugin configuration window is very trivial. I recommend using read-only mode to avoid the problems with multiple email addresses or telephone numbers. More serious limitations exist in the address book itself: if the CardDAV server has more than 2 email addresses, they will not appear in the Thunderbird address book. Changing an email address on the client side will result in loss of those additional email addresses on the server-side. The Mozilla team initially committed to rewrite the address book from the ground up but it is not clear whether that work is still in progress. DAViCal confusion I had been using DAViCal for quite some time as a CalDAV server before I recently decided to try and use it for CardDAV as well. As a calendar server, it had been working fine. I have a small number of users with some private calendars and also some shared calendars. When I installed the SOGo Connector plugin and went to setup the client-side configuration, it asked me for the server URL. Having used CalDAV already, I simply tried copying the existing CalDAV URL, which is of the form https://myserver.example.org/davical/caldav.php/daniel/home The URL was accepted and I could create address book entries on the client side. No errors appeared. I then tried setting up the same URL as an address book in Evolution. This time, I found I was able to create contacts and sync them to the server and access them from another copy of Evolution. It appeared DAViCal was working fine and I felt that Thunderbird or the SOGo Connector was somehow broken, because the Evolution clients were syncing but Thunderbird was not. The lack of any errors from the Thunderbird GUI further undermined my confidence in the SOGo Connector. Nonetheless, I decided to dig a little deeper, looking through the DAViCal source code. This brought me to the realisation that each URL could be flagged as either a calendar resource or an address book resource. This had not been obvious to me before. Evolution had effectively been storing the contact entries into a calendar URL and DAViCal had been happily accepting them. SOGo Connector, querying the URL, appears to have received both calendar and contact records from DAViCal and given up.

Here is where I configured the extra resource for an address book store under my user account in DAViCal In the DAViCal web management console, I created an extra resource for my user ID and set the appropriate checkbox to mark the resource as an address book. Then I put this resource URL into all the clients and found that they were all able to sync with each other. DAViCal's maintainer has explained that this is a limitation in the CardDAV/CalDAV protocols: technically, it is valid for the client to store the wrong type of objects in each URL. Clients need to be more resilient when they stumble across objects they don't expect in a particular resource. Conclusion A complete free software based contact-management solution can be easily deployed using any of the servers mentioned above, particularly if Evolution is the preferred client software. Using Thunderbird as a client, it can also work, but attention is needed to avoid data loss on sync. Thunderbird could also be used in a read-only mode to access address books maintained with Evolution. Mobile support is still patchy on Android. The closed source solution (CardDAV-Sync and CalDAV-Sync) appears to work and a promising open source close, aCal, appears to be "nearly there". Some people would obviously prefer to use a standalone contact/calendar server, while others may prefer an all-in-one groupware suite. Both approaches appear to be feasible, but the standalone contact/calendar server appears to offer a more compelling way to start quickly and have minimal administrative overhead, particularly due to the fact that DAViCal is available in a supported package.

This is the contact server configuration window in Evolution. It offers slightly more options than SOGo Connector in Thunderbird and appears to be a more finished solution overall.

5 September 2013

Keith Packard: Airfest-altimeter-testing

Altimeter Testing at Airfest Bdale and I, along with AJ Towns and Mike Beattie, spent last weekend in Argonia, Kansas, flying rockets with our Kloudbusters friends at Airfest 19. We had a great time! AJ and Mike both arrived a week early at Bdale s to build L3 project airframes, and both flew successful cert flights at Airfest! Airfest was an opportunity for us to test fly prototypes of new flight electronics Bdale and I have spent the last few weeks developing, and I thought I d take a few minutes today to write some notes about what we built and flew. TeleMega We ve been working on TeleMega for quite a while. It s a huge step up in complexity from our original TeleMetrum, as it has a raft of external sensors and six pyro circuits. Bdale flew TeleMega in his new fiberglass 4 airframe on a Loki 75mm blue M demo motor. GPS tracking was excellent; you can see here that GPS altitude tracked the barometric sensor timing exactly:

GPS lost lock when the motor lit, but about 3 seconds after motor burnout, it re-acquired the satellite signals and was reporting usable altitude data right away. The GPS reported altitude was higher than the baro sensor, but that can be explained by our approximation of an atmospheric model used to convert pressure into altitude. The rest of the flight was also nominal; TeleMega deployed drogue and main chutes just fine. TeleMetrum We ve redesigned TeleMetrum. The new version uses better sensors (MS5607 baro sensor, MMA6555 accelerometer) and a higher power radio (CC1120 40mW). The board is the same size, all the connectors are in the same places so it s a drop-in replacement, and it s still got two pyro channels and USB for configuration, data download and battery charging. I loaded up my Candy-Cane airframe with a small 5 grain 38mm CTI classic:

The flight computer worked perfectly, but GPS reception was not as good as we d like to see:

Given how well TeleMega was receiving GPS signals, I m hopeful that we ll be able to tweak TeleMetrum to improve performance. TeleMini We ve also redesigned TeleMini. It s still a two-channel flight computer with logging and telemetry, but we ve replaced the baro sensor with the MS5607, added on-board flash for increased logging space and added on-board screw terminals for an external battery and power switch. You can still use one of our 3.7V batteries, but you can also use another battery providing from 3.7 to 15V. I was hoping to finish up the firmware and fly it, but I ran out of time before the launch. The good news is that all of the components of the board have been tested and work correctly, and the firmware is feature complete , meaning we ve gotten all of the features coded, it s just not quite working yet. EasyMini EasyMini is a new product for us. It s essentially the same as a TeleMini, but without a radio. Two channels, baro-only, with logging. Like TeleMini, it includes an on-board USB connector and can use either one of our 3.7V batteries, or an external battery from 3.7V to 15V. EasyMini and TeleMini are the same size, and have holes in the same places, so you can swap between them easily. I flew EasyMini in my Koala airframe with a 29mm 3 grain CTI blue-streak motor. EasyMini successfully deployed the main chute and logged flight data:

We also sent a couple of boards home with Kevin Trojanowski and Greg Rothman for them to play with. TeleGPS TeleGPS is a GPS tracker, incorporating a u-blox Max receiver and a 70cm transmitter. It can send position information via APRS or our usual digital telemetry formats. I was also hoping to have the TeleGPS firmware working, and I spent a couple of nights in the motel coding, but didn t manage to finish up. So, no data from this board either. Production Plans Given the success of the latest TeleMega prototype, we re hoping to have it into production first. We ll do some more RF testing on the bench with the boards to make sure it meets our standards before sending it out for the first production run. The goal is to have TeleMega ready to sell by the end of October. TeleMetrum clearly needs work on the layout to improve GPS RF performance. With the testing equipment that Bdale is in the midst of re-acquiring, it should be possible to finish this up fairly soon. However, the flight firmware looks great, so we re hoping to get these done in time to sell by the end of November. TeleMini is looking great from a hardware perspective, but the firmware needs work. Once the firmware is running, we ll need to make enough test flights to shake out any remaining issues before moving forward with it. EasyMini is also looking finished; I ve got a stack of prototypes and will be getting people to fly them at my local launch in another couple of weeks. The plan here is to build a small batch by hand and get them into the store once we re finished testing, using those to gauge interest before we pay for a larger production run.

7 August 2013

Keith Packard: embedded cpus

Choosing Embedded Processors for AltOS When Bdale and I started building rocketry hardware together, we had already picked out a target processor, the TI cc1111. We picked that chip almost entirely based on the digital transceiver that is built in to the chip, and not because we had any particular love for the 8051 microcontroller. At that time, I d seen people struggle with PIC processors, battle AVR to a draw and spend a lot of time trying to get various ARM processors running. So, the 8051 didn t seem all that far from normal, and the cc1111 implementation of it is pretty reasonable, including credible USB support and a built-in DMA engine. Since those early days, we ve gone on to build boards with a slightly wider range of processors:

TI cc1111
Atmel ATmega32U4
Atmel ATiny85
STMicroelectronics STM32L151/STM32L152
NXP LPC11U14

Bdale thinks we should reduce the number of components we use to focus our efforts better. He s probably right, but I have to admit that I ve had way too much fun getting each of these chips running. I thought I d spend a bit of time describing our general process for selecting a new CPU.

Free software. I m long past the age at which I trust hardware vendors to deliver credible binary software. If you want me to use your silicon, make sure the tools required to use it are all free software.
Debian support required. Bdale and I do everything on Debian, so any processor we use needs to have tools that run under Debian. We re more than willing to package software for Debian, so if it s just generic Linux-compatible free software, that s fine.
Built-in USB. Everything we build needs to talk to a computer for configuration and data download. Don t even think of asking me to attach another chip for USB support. Ok, so the ATtiny85 doesn t qualify here, and that project does use an external chip for USB which is connected to the ATtiny85 through an LED and phototransistor. I was excited when TI came out with the CC340 chips because the MSP430 is a much nicer CPU than the 8051. But, they still haven t added USB. If I m going to add another chip for USB, it ll be a CPU anyways, at which point I ll use a stand-alone RF chip.
Cheap developer boards. TI sells a tiny little cc1111 developer board; STmicroelectronics and NXP both sell boards that include a programmer and a target CPU for prototyping software. And, there are a million Arduino clones with various Atmel processors on them. Cheap means I can buy them without thinking too hard; having a built-in programmer means that early OS development can be done without first having to figure out how to wire up a programmer.
Probably 32 bits, probably an ARM. Yeah, the cc1111 and Atmel processors were fun to bring up, but I m pretty much over the 8-bit microcontroller now. ARM processors are far more sophisticated in terms of power management and built-in device capability, plus they re a ton faster at doing any math. The ATtiny85 remains an exception here; it s smaller than any ARM I can get, and so it s still useful for projects too small for an ARM.
Available in small quantities from a reasonable on-line retailer like Digikey, Mouser or even Avnet Express. I d love to try out the new low-cost STM32L100 parts, but the only one available is the largest one in the 64LQFP package, and I really don t need that many pins for most products.

CC1111 involved a lot of software hacking The 8051 processor in the CC1111 is very well documented, including the two-wire debugging interface. What was missing was a canned solution for programming and debugging the chip from Debian. However, with sufficient motivation and accurate docs, I was able to create a programmer from a USB device with a couple of GPIOs and a lot of ugly software on Linux. That sufficed to get the USB stack limping along, at which point I wrote a much faster programmer that ran on the cc1111 itself and could program another cc1111. With that, I created an assembly-level debugger that could be hooked to the existing SDCC source level debugger and had a full source-level debugging environment for the 8051. This turned out to be way better than what I could get on the Atmel processors, where I ve got a program loader and a whole lot of printf debugging. STM has been great The STM32L-Discovery board has a standard STM debugging setup for the Cortex SWD interface right on the same board as a target CPU. That made for completely self-contained development, with no jumper wires (a first for me, for sure). There s free software, stlink which can talk over the debugger USB connection to drive the SWD interface. This is sufficient to flash and debug applications using GDB. Of course, GCC supports ARM quite well; the only hard part was figuring out what to do for a C library. I settled on pdclib, which is at least easy to understand, if not highly optimized for ARM. We ve built a ton of boards with the STM32L151 and STM32L152; at this point I m pretty darn comfortable with the architecture and our tool chain. Adventures with NXP The NXP LPC11U14 is a very different beast. I m using this because:

It s small (5mm x 5mm)
It s cheap ($1.48 q100)

The LPCXpresso board looks much like the STM32L-Discovery, with a debugger interface wired to the CPU directly on the board. However, I haven t found free software tools to drive this programmer; all I ve found are binary-only tools from NXP. No thanks. Fortunately, the LPC11U14 uses exactly the same SWD interface as the STM32L, so I was able to sever the link between the programmer and the target on the LPCXpresso board and hook the target to an ST-Link device (either the one on the STM32L-Discovery board, or the new stand-along programming dongle I bought). With that, I wrote an openocd script to talk to the LPC11U14 and was in business. What I found in the NXP processor was a bit disturbing though there s a mask ROM that contains a boot loader, which always runs when the chip starts, and a bunch of utility code, including the only documented interface to programming the flash memory. I cannot fathom why anyone thought this was a good idea I don t want a BIOS in my embedded CPU, thankyouverymuch; I d really like my code to be the first instructions executed by the CPU. And, any embedded developer is more than capable of programming flash using a register specification, and calling some random embedded code in ROM from the middle of my operating system is more than a bit scary. NXP could do two simple things to make me like their parts a whole lot more:

Publish the programmer source code. The programming dongle is a tool for selling your silicon, not a separate revenue stream. Publish the source under a free software license (GPL recommended, of course). We ll take it, make it better, and ensure that you get the changes back to help your other customers. It s a win-win plan.
Document the flash programming registers. Yeah, I could probably disassemble the mask-rom bits that are on the chip, but I have better things to do with my time. Heck, go wild and document any other hidden bits in the silicon. Again, we re embedded developers, we don t like magic hardware hidden from view any more than PC developers like the BIOS and SMM.

Right now, I m hoping the STM32L100C6 parts become available in small quantities so I can try them out; they promise to be as cheap as the LPC11U14, but are better supported by free software and offer more complete hardware documentation. Yeah, they re a bit larger; that will probably be annoying.

Keith Packard: Cursor tracking

Tracking Cursor Position I spent yesterday morning in the Accessibility BOF here at Guadec and was reminded that one persistent problem with tools like screen magnifiers and screen readers is that they need to know the current cursor position all the time, independent of which window the cursor is in and independent of grabs. The current method that these applications are using to track the cursor is to poll the X server using XQueryPointer. This is obviously terrible for at least a couple of reasons:

Keeps the system active at regular intervals, preventing power savings.
Increased latency in mouse tracking the interval between polling calls limits the time resolution of the position information.

These two problems also conflict with one another. Reducing input latency comes at the cost of further reducing the opportunities for power saving, and vice versa. XInput2 to the rescue (?)

XInput2 has the ability to deliver raw device events right to applications, bypassing the whole event selection mechanism within the X server. This was designed to let games and other applications see relative mouse motion events and drawing applications see the whole tablet surface. These raw events are really raw though; they do not include the cursor position, and so cannot be directly used for tracking. However, we do know that the cursor only moves in response to input device events, so we can easily use the arrival of a raw event to trigger a query for the mouse position. A better plan? Perhaps what we should do is to actually create a new event type to report the cursor position and the containing window so that applications can simply track that. Yeah, it s a bit of a special case, but it s a common requirement for accessibility tools.

 
    CursorEvent
        EVENTHEADER
        detail:                    CARD32
        sourceid:                  DEVICEID
        flags:                     DEVICEEVENTFLAGS
    root:                      WINDOW
    window:                    WINDOW
    root-x, root-y:            INT16
    window-x, window-y:        INT16

A CursorEvent is sent whenever a sprite moves on the screen. sourceid is the master pointer which is moving. root is the root window containing the cursor, window is the window that the pointer is in. root-x and root-y indicate the position within the root window, window-x and window-y indicate the position within window . Demo Application Here s a short application, hacked from Peter Hutterer s part1.c

/* cc -o track_cursor track_cursor.c  pkg-config --cflags --libs xi x11  */
#include <stdio.h>
#include <string.h>
#include <X11/Xlib.h>
#include <X11/extensions/XInput2.h>
/* Return 1 if XI2 is available, 0 otherwise */
static int has_xi2(Display *dpy)
 
    int major, minor;
    int rc;
    /* We support XI 2.2 */
    major = 2;
    minor = 2;
    rc = XIQueryVersion(dpy, &major, &minor);
    if (rc == BadRequest)  
    printf("No XI2 support. Server supports version %d.%d only.\n", major, minor);
    return 0;
      else if (rc != Success)  
    fprintf(stderr, "Internal Error! This is a bug in Xlib.\n");
     
    printf("XI2 supported. Server provides version %d.%d.\n", major, minor);
    return 1;
 
static void select_events(Display *dpy, Window win)
 
    XIEventMask evmasks[1];
    unsigned char mask1[(XI_LASTEVENT + 7)/8];
    memset(mask1, 0, sizeof(mask1));
    /* select for button and key events from all master devices */
    XISetMask(mask1, XI_RawMotion);
    evmasks[0].deviceid = XIAllMasterDevices;
    evmasks[0].mask_len = sizeof(mask1);
    evmasks[0].mask = mask1;
    XISelectEvents(dpy, win, evmasks, 1);
    XFlush(dpy);
 
int main (int argc, char **argv)
 
    Display *dpy;
    int xi_opcode, event, error;
    XEvent ev;
    dpy = XOpenDisplay(NULL);
    if (!dpy)  
    fprintf(stderr, "Failed to open display.\n");
    return -1;
     
    if (!XQueryExtension(dpy, "XInputExtension", &xi_opcode, &event, &error))  
       printf("X Input extension not available.\n");
          return -1;
     
    if (!has_xi2(dpy))
    return -1;
    /* select for XI2 events */
    select_events(dpy, DefaultRootWindow(dpy));
    while(1)  
    XGenericEventCookie *cookie = &ev.xcookie;
    XIRawEvent      *re;
    Window          root_ret, child_ret;
    int         root_x, root_y;
    int         win_x, win_y;
    unsigned int        mask;
    XNextEvent(dpy, &ev);
    if (cookie->type != GenericEvent  
        cookie->extension != xi_opcode  
        !XGetEventData(dpy, cookie))
        continue;
    switch (cookie->evtype)  
    case XI_RawMotion:
        re = (XIRawEvent *) cookie->data;
        XQueryPointer(dpy, DefaultRootWindow(dpy),
                  &root_ret, &child_ret, &root_x, &root_y, &win_x, &win_y, &mask);
        printf ("raw %g,%g root %d,%d\n",
            re->raw_values[0], re->raw_values[1],
            root_x, root_y);
        break;
     
    XFreeEventData(dpy, cookie);
     
    return 0;

Hacks in xeyes Of course, one common mouse tracking application is xeyes, so I ve hacked up that code (on top of my present changes) here:

git clone git://people.freedesktop.org/~keithp/xeyes.git

24 July 2013

Keith Packard: present-redirect

Present Extension Redirection Multi-buffered applications have always behaved poorly in the presence of the Composite extension:

Updating involves multiple copies of the window contents, first from back buffer to composite redirect buffer, thence from the composite buffer to the screen back buffer, and then from the screen back buffer to the scanout buffer. The last copy is amenable to page flipping, but that would require the EGL buffer age extension so that the compositor could reduce the cost of presenting following frames by being able to track the contents of the back buffers.
The application is not informed about when the actual screen presentation occurs.

Owen Taylor suggested that Present should offer a way to redirect operations to the compositing manager as a way to solve these problems. This posting is my attempt to make that idea a bit more concrete given the current Present design. Design Goals Here s a list of features I think we should try to provide:

Provide accurate information to applications about when presentation to the screen actually occurs. In particular, GLX applications using GLX_OML_sync_control should receive the correct information in terms of UST and MSC for each Swap Buffers request.
Ensure that applications still receive correct information as to the contents of their buffers, in particular we want to be able to implement EGL_EXT_buffer_age in a useful manner.
Avoid needing to un-redirect full-screen windows to get page flipping behavior.
Eliminate all extra copies. A windowed application may still perform one copy from back buffer to scanout buffer, but there shouldn t be any reason to copy contents to the composite redirection buffer or the compositing manager back buffer.

Simple Present Redirection With those goals in mind, here s what I see as the sequence of events for a simple windowed application doing a new full-window update without any translucency or window transformation in effect:

Application creates back buffer, draws new frame to it.
Application executes PresentRegion. In this example, the valid and update parameters are None , indicating that the full window should be redrawn.
The server captures the PresentRegion request and constructs a PresentRedirectNotify event containing sufficient information for the compositor to place that image correctly on the screen:
- target window for the presentation
- source pixmap containing the new image
- idle fence to notify when the source pixmap is no longer in use.
- serial number from the request.
- target MSC for the presentation. This should probably just be the computed absolute MSC value, and not the original target/divisor/numerator values provided by the application.
The compositing manager receives this event and constructs a new PresentRegion request using the provided source pixmap, but this time targeting the root window, and constructing a valid region which clips the pixmap to the shape of the window on the screen. This request would use the original application s idle fence value so that when complete, the application would get notified. This request would need to also include the original target window and serial number so that a suitable PresentCompleteNotify event can be constructed and delivered when the final presentation is complete.
The server executes this new PresentRegion operation. When complete, it delivers PresentCompleteNotify events to both the compositing manager and the application.
Once the source pixmap is no longer in use (either the copy has completed, or the screen has flipped away from this pixmap), the server triggers the idle fence.

Multiple Application Redirection If multiple applications perform PresentRegion operations within the same frame, then the compositing manager will receive multiple PresentRedirectNotify events, and can simply construct multiple new PresentRegion requests. If these are all queued to the same global MSC, they will execute at the same frame boundary. No inter-operation dependency exists here. Complex Presentations Ok, so the simple case looks like it s pretty darn simple to implement, and satisfies the design goals nicely. Let s look at a couple of more complicated cases in common usage; the first is with translucency, the second with scaling application images down to thumbnails and the third with partial application window updates. Redirection with Translucency If the compositing manager discovers that a portion of the updated region overlays or is overlaid by something translucent (either another window, or drop shadows, or something else), then a composite image for that area must be constructed before the whole can be presented. Starting when the compositing manager receives the event, we have:

The compositing manager receives this event. Using the new pixmap, along with pixmaps for the other involved windows and graphical elements, the compositing manager constructs an updated view for the affected portion of the screen in a back buffer pixmap. Once complete, a PresentRegion operation that uses this back buffer pixmap is sent to the X server. Again, the original target window and serial number are also delivered to the server so that a suitable PresentCompleteNotify event can be delivered to the application.
The server executes this new PresentRegion operation; PresentCompleteNotify events are delivered, and idle fences triggered as appropriate.

Redirection with Transformation Transformation of the window contents means that we cannot always update a portion of the back buffer directly from the provided application pixmap as that will not contain the window border. Contents generated from a region that includes both application pixels and window border pixels must be sourced from a single pixmap containing both sets of pixels. One option that I ve discussed in the past to solve this would be to have the original application allocate the pixmap large enough to hold both the application contents and the surrounding window border. Have it draw the application contents at the correct offset within this pixmap, and then have the window manager contents drawn around that; either automatically by the X server, or even manually by the compositing manager. That would be mighty convenient for the compositing manager, but would require significant additional infrastructure throughout the X server and even harder the drawing system (OpenGL or some other system). There s another reason to want this though, and that s for sub-frame buffer scanout page swapping. The second option would be for the compositing manager to combine these images itself; there s a nice pixmap already containing the window manager image the composite redirect buffer. Taking the provided source pixmap and copying it directly to the target window will construct our composite image, just as if we had no Present redirection in place. This will cost an additional copy though, which we ve promised to avoid. Of course, as it s just for thumb-nailing or other visual effects, perhaps the compositing manager could perform this operation at a reduced frame rate, so that overall system performance didn t suffer. Retaining Access to the Application Buffer Above, I discussed having the idle fence from the redirected PresentRegion operation be sent along with the replacement PresentRegion operation. This ignores the fact that the composting manager may well need the contents of that application frame again in the future, when displaying changes for other applications that involve the same region of the screen. With the goal of making sure the idle fences are triggered as soon as possible so that applications can re-use or free the buffers quickly, let s think about when the triggering can occur.

Full-screen flipped applications. In this case, the application s idle fence can be triggered once the application provides a new frame and the X server has flipped to that new frame, or some other scanout buffer.
Windowed, copied applications. In this case, the application s idle fence can be triggered once the application provides a new frame to the compositing manager, and the X server doesn t have any presentations queued.

In both cases, we require that both the X server and the compositing manager be finished with the buffer before the application s idle fence should be triggered. One easy way to get this behavior is for the composting manager to create a new idle fence for its operations. When that is triggered, it would receive an X event and then trigger the applications idle fence as appropriate. This would add considerable latency to the application s idle fence a round trip through the compositing manager. The alternative would be to construct some additional protocol to make the applications idle fence dependent on the Present operation and some additional state provided by the compositing manager. Some experimentation here is warranted, but my experience with latency in this area is that it causes applications to end up allocating another back buffer as the idle notification arrives just after a buffer allocation request comes down to the rendering library. Definitely sub-optimal. An Aside on Media Stream Counters The GLX_OML_sync_control extension defines the Media Stream Counter (MSC) as a counter unique to the graphics subsystem which is incremented each time a vertical retrace occurs. That would be trivial if we had only one vertical retrace sequence in the world. However, we actually have N+1 such counters, one for each of the N active monitors in the system and a separate fake counter to be used when none of the other counters is available. In the current Present implementation, windows transition among the various Media Stream Counter domains as they move between the various monitors, and those monitors get turned on and off. As they move between these counter domains, Present tracks a global offset from their original domain. This offset ensures that the MSC value remains monotonically increasing as seen by each window. What it does not ensure is that all windows have comparable MSC sequence values; two windows on the same monitor may well have different MSC values for the same physical retrace event. And, even moving a window from one MSC domain to another and back won t make it return to the original MSC sequence values due to differences in refresh rates between the monitors. Internally, Present asserts that each CRTC in the system identifies a unique MSC domain, and it has a driver API which identifies which CRTC a particular window should be associated with. Once a particular CRTC has been identified for a window, client-relative MSC values and CRTC-relative MSC values can be exchanged using an offset between that CRTC MSC domain and the window MSC domain. The Intel driver assigns CRTCs to windows by picking the CRTC showing the greatest number of pixels for a particular window. When two CRTCs show the same number of pixels, the Intel driver picks the first in the list. Vblank Synchronization and Multiple Monitors Ok, so each window lives in a particular MSC domain, clocked by the MSC of the CRTC the driver has associated it with. In an un-composited world, this makes picking when to update the screen pretty simple; Present updates the screen when vblank happens in the CRTC associated with the window. In the composite redirected case, it s a bit harder all of the PresentRegion operations are going to target the root window, and yet we want updates for each window to be synchronized with the monitor containing that window. Of course, the root window belongs to a single MSC domain (likely the largest monitor, using the selection algorithm described above from the Intel driver). So, any PresentRegion requests will be timed relative to that single monitor. I think what is required here is for the PresentRegion extension to take an optional CRTC argument, which would then be used as the MSC domain instead of the window MSC domain. All of the timing arguments would be interpreted relative to that CRTC MSC domain. The PresentRedirectNotify event would then contain the relevant CRTC and the MSC value would be relative to that CRTC. A clever Compositing manager could then decompose a global PresentRegion operation into per-CRTC PresentRegion operations and ensure that multiple monitors were all synchronized correctly. We could take this even further and have the PresentRegion capable of passing a smaller CRTC-sized pixmap down to the kernel, effectively providing per-CRTC pixmaps with no visible explicit protocol Other Composite Users Ok, so the above discussion is clearly focused on getting the correct contents onto the screen with minimal copies along the way. However, what I ve ignored is how to deal with other applications, also using Composite at the same time. They re all going to expect that the composite redirect buffers will contain correct window contents at all times, and yet we ve just spent a bunch of time making that not be the case to avoid copying data into those buffers and instead copying directly to the compositing manager back or front buffers. Obviously the X server is aware of when this happens; the compositing manager will have selected for manual redirection on all top-level windows, while our other application will have only been able to select for automatic redirection. So, we ve got two pretty clear choices here:

Have the X server change how Present redirection works when some other application selects for Automatic redirection on a window. It would copy the source pixmap into the window buffer and then send (a modified?) PresentRedirectNotify event to the compositing manager.
Include a flag in the PresentRedirectNotify event that the composite redirect buffer needs to eventually get the contents of the source pixmap, and then expect the compositing manager to figure out what to do.

Development Plans As usual, I m going to pick the path of least resistance for all of the above options and see how things look; where the easy thing works, we can keep using it. Where the easy thing fails, I ll try something else. The changes required for this are pretty minimal. The PresentRegion request needs to gain a list of window/serial pairs that are also to be notified when the operation completes:

PRESENTNOTIFY  
    window: WINDOW
    serial: CARD32
     
 
    PresentRegion
    window: WINDOW
    pixmap: PIXMAP
    serial: CARD32
    valid-area: REGION or None
    update-area: REGION or None
    x-off, y-off: INT16
    idle-fence: FENCE
    target-crtc: CRTC or None
    target-msc: CARD64
    divisor: CARD64
    remainder: CARD64
    notifies: LISTofPRESENTNOTIFY
 
    Errors: Window, Pixmap, Match

The target-crtc parameter explicitly identifies a CRTC MSC domain. If None, then this request implicitly uses the window MSC domain. notifies provides a list of windows that will also receive PresentCompleteNotify events with the associated serial number when this PresentRegion operation completes.

 
    PresentRedirectNotify
    type: CARD8         XGE event type (35)
    extension: CARD8        Present extension request number
    length: CARD16          2
    evtype: CARD16          Present_RedirectNotify
    eventID: PRESENTEVENTID
    event-window: WINDOW
    window: WINDOW
    pixmap: PIXMAP
    serial: CARD32
    valid-area: REGION
    valid-rect: RECTANGLE
    update-area: REGION
    update-rect: RECTANGLE
    x-off, y-off: INT16
    target-crtc: CRTC
    target-msc: CARD64
    idle-fence: FENCE
    update-window: BOOL

The target-crtc identifies which CRTC MSC domain the target-msc value relates to. divisor and remainder have been removed as the target-msc value has been adjusted using the application values. If update-window is True, then the recipient of this event is instructed to provide reasonably up-to-date contents directly to the window by copying the contents of pixmap to the window manually. Beyond these two protocol changes, the compositing manager is expected to receive Sync events when the idle-fence is triggered and then manually perform a Sync operation to trigger the client s idle-fence when appropriate. I m planning to work on these changes, and then go re-work xcompmgr (or perhaps unagi, which certainly looks less messy) to incorporate support for Present redirection. The goal is to have something to demonstrate at Guadec, which doesn t seem impossible, aside from leaving on vacation in four days

23 July 2013

Keith Packard: present sync

Implementing Vblank Synchronization in the Present Extension This is mostly a status update on how the Present extension is doing; the big news this week is that I ve finished implementing vblank synchronized blts and flips, and things seem to be working quite well. Vblank Synchronized Blts The goal here is to have the hardware executing the blt operation in such a way as to avoid any tearing artifacts. In current drivers, there are essentially two different ways to make this happen:

Insert a command into the ring which blocks execution until a suitable time immediately preceding the blt operation.
Queue the blt operation at vblank time so that it executes before the scanout starts.

Option 1. provides the fewest artifacts; if the hardware can blt faster than scanout, there shouldn t ever be anything untoward visible on the screen. However, it also blocks future command execution within the same context. For example, if two vblank synchronized blts are queued at the same time, its possible for the second blt to be delayed by yet another frame, causing both applications to run at half of the frame rate. Option 2. avoids blocking the hardware, allowing for ongoing operations to proceed on the hardware without waiting for the synchronized blt to complete. However, it can cause artifacts if the delay from the vblank event to the eventual execution of the blt command is too long. Queuing the blt right when it needs to execute means that we also have the opportunity to skip some blts; if the application presents two buffers within the same frame time, the blt of the first buffer can be skipped, saving memory bandwidth and time. Present uses Option 2, which may occasionally cause a tearing artifact, but avoids slowing down applications while allowing the X server to discard overlapping blt operations when possible. Queuing the Blt at Vblank There are several options for getting the blt queued and executed when the vblank occurs:

Queue the blt from the interrupt handler
Queue the blt from a kernel thread running in response to the interrupt
Send an event up to user space and have the X server construct the blt command.

These are listed in increasing maximum latency, but also in decreasing complexity. Option 1. is made more complicated as much of the work necessary to get a command queued to the hardware cannot be done from interrupt context. One can imagine having the desired command already present in the ring buffer and have the interrupt handler simply move the ring tail pointer value. Future operations to be queued before the vblank operation could then re-write the ring as necessary. A queued operation could also be adjusted by the X server as necessary to keep it correct across changes to the window system state. Option 2. is similar, but the kernel implementation should be quite a bit simpler as the queuing operation is done in process context and can use the existing driver infrastructure. For the X server, this is the same as Option 1, requiring that it construct a queued blt operation and deliver that to the kernel, and then revoke and re-queue if the X server state changed before the operation was completed. Option 3 is the simplest of all, requiring no changes within the kernel and few within the X server. The X server waits to receive a vblank notification event for the appropriate frame and then simply invokes existing mechanisms to construct and queue the blt operation to the kernel. Oddly, Present currently uses Option 3. If that proves to generate too many display artifacts, we can come back and change the code to try something more complicated. Flipping the Frame Buffer Taking advantage of the hardware s ability to quickly shift scanout from one chunk of memory to another is critical to providing efficient buffer presentation within the X server. It is slightly more complicated to implement than simply copying data to the current scanout buffer for a few reasons:

The presented pixmap is owned by the application, and so it shouldn t be used except when the presented window covers the whole screen. When the window gets reconfigured, we end up copying the window s pixmap to the regular screen pixmap.
The kernel flipping API is asynchronous, and doesn t provide any abort mechanism. This isn t usually much of an issue; we simply delay reporting the actual time of flip until the kernel sends the notification event to the X server. However, if the window is reconfigured or destroyed while the flip is still pending, cleaning that up must wait until the flip has finished.
The applications buffer remains busy until it is no longer being used for scanout; that means that applications will have to be aware of this and ensure that they don t deadlock waiting for the current scanout buffer to become idle before switching to a new scanout buffer.

Present is different from DRI2 in using application-allocated buffers for this operation. For DRI2, when flipping to a window buffer, that buffer becomes the screen pixmap the driver flips the new buffer object into the screen pixmap and releases the previous buffer object for other use. For Present, as the buffer is owned by the application, I figured it would be better to switch back to the real screen buffer when necessary. This also means that applications aren t left holding a handle to the frame buffer, which seems like it might be a nice feature. The hardest part of this work was dealing with client and server shutdown, dealing with objects getting deleted in random orders while other data structures retained references. (The kernel DRM drivers use the term page flipping to mean an atomic flip between one frame buffer and another, generally implemented by simply switching the address used for the scanout buffer. I d like to avoid using the word page in this context as we re not flipping memory pages individually, but rather a huge group of memory that forms an entire frame buffer. We could use plane flipping (as intel docs do), frame buffer flipping (but that s a mouthful), display flipping or almost anything but page flipping ). Overall DRI3000 Status At this point, the DRI3 extension is complete and the Present extension is largely complete, except for redirection for compositors. The few piglit tests for GLX_OML_sync_control all pass now, which is at least better than DRI2 does. I think I ve effectively replicated the essential features of DRI2 while offering room to implement a couple of new GL extensions:

GLX_EXT_swap_control_tear. This will provide applications with the ability to avoid dropping frames when pushing the hardware just over the frame rate limit.
EGL_EXT_buffer_age. (I assume we ll probably want a GLX version as well?) This will allow compositors to more efficiently perform partial updates in a flipping environment, and is enabled by having all of the buffer management within the GL library.

The code for this stuff has all been pushed to a number of repositories:

git://people.freedesktop.org/~keithp/dri3proto.git master. DRI3 protocol specification and X server headers.
git://people.freedesktop.org/~keithp/presentproto.git master. Present protocol specification and X server headers.
git://people.freedesktop.org/~keithp/xcb/proto.git dri3. XCB protocol defines for both DRI3 and Present.
git://people.freedesktop.org/~keithp/xcb/libxcb.git dri3. XCB library changes for file descriptor passing.
git://people.freedesktop.org/~keithp/xserver.git dri3. X server with file descriptor passing, DRI3 and Present support.
git://people.freedesktop.org/~keithp/mesa.git dri3 Mesa with DRI3/Present support for GLX.
git://people.freedesktop.org/~keithp/drm.git dri3. DRM library with defines for async flipping.
git://people.freedesktop.org/~keithp/xf86-video-intel.git dri3. Intel driver with DRI3, Present and async flipping support.
git://git.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6 dri3. Kernel with async flipping.

22 July 2013

Keith Packard: async flip

Asynchronous Display Flipping for Linux Using display (or plane or page) flipping is a nice optimization for screen updates as it avoids copying the application render buffer to the separate scanout buffer. However, in the current DRM interface, the flip operation is not performed immediately. Rather, the flip operation must be presented to the kernel before the vertical blank interval preceding the desired display time. If the flip operation is even slightly delayed, the new scanout image will not be visible until the following frame. With many applications pushing the performance of the graphics hardware right to the edge, it s not at all uncommon for an occasional frame to just miss and cause an ugly jerk on the screen. The GLX_EXT_swap_control_tear extension provides a way for applications to request that when the swap request is delivered too late, instead of delaying for the next frame, the contents should be presented as soon as possible, even if that might cause some tearing artifacts on the screen. This gives the application developer the choice between the two visual artifacts (tearing vs judder). A Linux interface for Async Flipping This part was pretty easy to implement. First, a new capability value, DRM_CAP_ASYNC_PAGE_FLIP was added so that applications could discover whether the driver offered this feature or not. Second, a new flag value, DRM_MODE_PAGE_FLIP_ASYNC was added to the DRM_IOCTL_MODE_PAGE_FLIP ioctl. To get the new flag down into the driver required adding the flag parameter to the page_flip crtc function. Async Display Flipping support in Intel Hardware Recent Intel display hardware has support for asynchronous flipping, either through the ring using the MI_DISPLAY_FLIP instruction or directly through the registers. There are some limitations however:

The pitch of the current and new frame buffers must match
Both current and new frame buffers must use X tiling

I ve also only implemented flipping on Sandybridge and Ivybridge hardware as that s what I ve got for testing. Using Async Display Flipping with Present I m using async display flipping when the kernel supports it in the current Present implementation. This offers the ability to short-circuit multiple presentations targeting the same scanout frame. I haven t exposed the ability to select this to applications over the Present protocol yet, but that s not a big change. Implementing the GLX_EXT_swap_control_tear extension I haven t even looked at implementing this extension; I m hoping that once I ve gotten the functionality exposed over the X protocol that someone will at least help me figure out how to hook this up. Getting the Bits The interesting pieces here are the new kernel bits; the DRM library changes simply add new defines to existing APIs.

Kernel bits are in my kernel.org repository git://git.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6 dri3
DRM library bits are in my freedesktop.org drm repository git://people.freedesktop.org/home/keithp/drm.git dri3

12 July 2013

Keith Packard: Present

The Present Extension I ve finally gotten an initial implementation of the Present extension written and running and thought I should write up the current design and status. Present Design The current Present extension consists of two requests:

PresentRegion. Puts new bits from a pixmap in a window.
PresentSelectInput. Asks for Present events to be delivered.

PresentRegion This request takes a pile of arguments:

 
    PresentRegion
    window: WINDOW
    pixmap: PIXMAP
    valid-area: REGION or None
    update-area: REGION or None
    x-off, y-off: INT16
    target-msc: CARD64
    divisor: CARD64
    remainder: CARD64
    idle-fence: FENCE
 
Errors: Drawable, Pixmap, Match

Provides new content for the specified window, to be made visible at the specified time (defined by target-msc , divisor and remainder ). update-area defines the subset of the window to be updated, or None if the whole window is to be updated. valid-area defines the portion of pixmap which contains valid window contents, or None if the pixmap contains valid contents for the whole window. PresentRegion may use any region of pixmap which contains update-area and which is contained by valid-area . In other words, areas inside update-area will be presented from pixmap , areas outside valid-area will not be presented from pixmap and areas inside valid-area but outside update-area may or may not be presented at the discretion of the X server. x-off and y-off define the location in the window where the 0,0 location of the pixmap will be presented. valid-area and update-area are relative to the pixmap. If target-msc is greater than the current msc for window , the presentation will occur at (or after) the target-msc field. Otherwise, the presentation will occur after the next field where msc % divisor == remainder . If divisor is zero, then the presentation will occur after the current field. idle-fence is triggered when pixmap is no longer in use. This may be at any time following the PresentRegion request, the contents may be immediately copied to another buffer, copied just in time for the vblank interrupt or the pixmap may be used directly for display, in which case it will be busy until some future PresentRegion operation. If window is destroyed before the presentation occurs, then the presentation action will not be completed. PresentRegion holds a reference to pixmap until the presentation occurs, so pixmap may be immediately freed after the request executes, even if that is before the presentation occurs. If idle-fence is destroyed before the presentation occurs, then idle-fence will not be signaled but the presentation will occur normally. PresentSelectInput

 
    PresentSelectInput
    event-id: PRESENTEVENTID
    window: WINDOW
    eventMask: SETofPRESENTEVENT
 
Errors: Window, Value, Match, IDchoice, Access

Selects the set of Present events to be delivered for the specified window and event context. PresentSelectInput can create, modify or delete event contexts. An event context is associated with a specific window; using an existing event context with a different window generates a Match error. If eventContext specifies an existing event context, then if eventMask is empty, PresentSelectInput deletes the specified context, otherwise the specified event context is changed to select a different set of events. If eventContext is an unused XID, then if eventMask is empty no operation is performed. Otherwise, a new event context is created selecting the specified events. Present Extension Events There are three different events for the Present extension:

PresentConfigureNotify
PresentCompleteNotify
PresentRedirectNotify

PresentConfigureNotify This event is moving from the DRI3 extension where it doesn t belong.

 
    PresentConfigureNotify
    type: CARD8         XGE event type (35)
    extension: CARD8        Present extension request number
    length: CARD16          2
    evtype: CARD16          Present_ConfigureNotify
    eventID: PRESENTEVENTID
    window: WINDOW
    x: INT16
    y: INT16
    width: CARD16
    height: CARD16
    off_x: INT16
    off_y: INT16
    pixmap_width: CARD16
    pixmap_height: CARD16
    pixmap_flags: CARD32

PresentConfigureNotify events are sent when the window configuration changes if PresentSelectInput has requested it. PresentConfigureNotify events are XGE events and so do not have a unique event type. x and y are the parent-relative location of window . PresentCompleteNotify

 
    PresentCompleteNotify
    type: CARD8         XGE event type (35)
    extension: CARD8        Present extension request number
    length: CARD16          2
    evtype: CARD16          Present_CompleteNotify
    eventID: PRESENTEVENTID
    window: WINDOW
    ust: CARD64
    msc: CARD64
    sbc: CARD64

Notify events are delivered when a PresentRegion operation has completed and the specified contents are being displayed. sbc, msc and ust indicate the swap count, frame count and system time of the related PresentRegion request. PresentRedirectNotify This one is not specified yet, but the intent is for it to contain sufficient information for the compositing manager to be able to construct a suitable screen update that includes an application window update. Finishing GLXOMLsync_control At this point, Mesa only exposes the old swapinterval configuration value, it doesn t provide any of the GLXOMLsynccontrol APIs. However, the Present protocol does have the bits necessary to support glXSwapBuffersMscOML in the PresentRegion request. Let s see how the remaining APIs in this GL extension will be supported. glXGetSyncValuesOML This one is easy; it only needs to return the UST/MSC/SBC values from the most recent SwapBuffers request. Those are returned in the PresentCompleteNotify event, so Mesa just needs to capture that event and save the values away for return to the application. We don t need any new Present protocol for this. glXGetMscRateOML This returns the refresh rate of the monitor associated with the specified drawable. RandR exposes all of the necessary data for each monitor, but the monitor each window is going to be synchronized against isn t exposed anywhere, and is effectively implementation-dependent. So, I think the easy thing to do here is to have a Present request which reports which RandR output a window will be synchronized with. glXSwapBuffersMscOML This is the one API which is already directly supported by the Present extension. glXWaitForMscOML This is effectively the same as glXSwapBuffersMscOML, except that it doesn t actually perform a swap. For this, I think we want a new request that generates an X event when the target values are reached, and then have the client block until that event is received. This will avoid blocking other threads using the same X connection. glXWaitForSbcOML Like glXWaitForMscOML, this just needs to trigger an event to be delivered at the right time. Presentation Redirection I m focusing on finishing the above stuff before I start writing the redirection spec and code, but I ve been thinking a bit about it. What we want is for applications that are presenting a new frame within a redirected window to have that presentation delivered directly to the compositing manager instead of having the bits copied to the redirected buffer and have the compositing manager only learn about this when the damage event is received from copying the bits. Once the compositing manager receives notification that there are new bits available for a portion of a window, it can then get those bits onto the screen in one of two ways:

Copy them to the back buffer for the whole screen and then present that back buffer to the screen.
Present them directly to the screen, bypassing the compositing manager s back buffer entirely.

The first option is what you d do if there were more updates than just a single window; it would update the whole screen at the same time. The second option is what you d do if there was only the one window update to act on, and this would automatically take advantage of page flipping to avoid copying data at all. No need to un-redirect windows for the page flip to work. In both cases, we need to inform the original application both when its pixmap is idle and when the swap actually hits the screen. And we need to make sure the final swap happens when the application requested. For the pixmap idle notification, the current PresentRegion idlefence argument should suffice; we just need to pass the idlefence XID along in the redirection event. Simple. For completion notification, we need to make sure the SBC value for the original window gets incremented and that a PresentCompleteNotify event is delivered. I think that means we want to append a chunk of data to the PresentRegion request so that suitable events can be delivered when the compositing manager PresentRegion occur. I think that just needs to be the window of that original window. To make sure the swap happens at the right time, we just need to have the targetmsc value provided to the compositing manager. I m hoping that having this all event driven is that when a later PresentRegion is redirected that has an *earlier* targetmsc, we can simply queue that as well and things should just work . Current Status I ve got the protocol definitions (both X server and XCB) done, and libxcb supporting the extension. I ve got the X server bits working, but the actual updates are not synchronized to the monitor; they re using OS timers and CopyArea for now, mostly so I can test things without also hacking drivers. Mesa is using the extension, but it only provides the swapinterval value and is not yet supporting the full GLXOMLsynccontrol extension. I ve also got a simple 2D core X application using the extension which is in the shmfd repository in my home directory on freedesktop.org. Availability As always, these bits are already published in my home directory on freedesktop.org in various git repositories.

5 June 2013

Guido G nther: Calendar synchronisation between Nokia N900 and the Calypso CalDAV server

One of the replies to the post about Debian's last groupware meeting was from Patrick Ohly of syncevolution fame pointing out that syncevolution already implements calendar autodetection for CalDAV calendars as described in draft-daboo-srv-caldav-10. While looking at the code I noticed that there's a backend for the N900s calendar by Ove K ven as well. When I tried Ove's latest package on my N900 it lead to an immediate crash when doing a:

syncevolution --print-items target-config@webdav calendar

According to Patrick the bug was supposed to be fixed in recent versions so I set up scratchbox and built a newer git snapshot for maemo (sources). This wouldn't crash but didn't show up any items either. It turned out to be a minor bug in calypso returning no content type for REPORT queries which resulted in libneon discarding the whole reply (now already fixed in calypso upstream). With this out of the way setting up synchronisation is quiet simple:

# Configuration
CALDAV_SERVER=192.168.0.10
syncevolution --configure username=<username> password=<password> \
              calendar/backend=caldav calendar/database=https://$ CALDAV_SERVER :5233/private/my_calendar \
              target-config@webdav calendar
syncevolution --configure --template SyncEvolution_Client sync=none syncURL=local://@webdav username= password= webdav
syncevolution --configure sync=two-way backend=calendar webdav calendar

You should then be able to print the items on the local (N900) and from the remote (CalDAV server) end:

# This lists the current calendar items on the server
syncevolution --print-items target-config@webdav calendar
# This lists the current calendar items on the N900
syncevolution --print-items @default calendar

And from there on sync away:

# initial slow sync
syncevolution --sync slow webdav
# from there on
syncevolution webdav

The syncevolution source code has great documentation about debugging problems (e.g. src/backends/webdav/README). So check that in case you run into problems. The tl;dr version is

SYNCEVOLUTION_DEBUG=1 src/syncevolution loglevel=10 --print-items target-config@webdav calendar

to debug CalDAV related problems. In case you need to run syncevoluton from source be sure to set these beforehand:

export SYNCEVOLUTION_TEMPLATE_DIR=$PWD/src/templates/
export SYNCEVOLUTION_XML_CONFIG_DIR=$PWD/src/syncevo/configs/

On the CalDAV side I used current Calypso git which (with some additional minor fixes) now also interoperates nicely with Iceowl/Icowl-Extension aka Sunbird/Lightning on the desktop side. There's also an ITP for it. So it'll hopefully end up in Debian soon.

4 June 2013

Keith Packard: dri3 extension

Completing the DRI3 Extension This week marks a pretty significant milestone for the glorious DRI3000 future. The first of the two new extensions is complete and running both full Gnome and KDE desktops. DRI3 Extension Overview The DRI3 extension provides facilities for building direct rendering libraries to work with the X window system. DRI3 provides three basic mechanisms:

Open a DRM device.
Share kernel objects associated with X pixmaps. The direct rendering client may allocate kernel objects itself and ask the X server to construct a pixmap referencing them, or the client may take an existing X pixmap and discover the underlying kernel object for it.
Synchronize access to the kernel objects. Within the X server, Sync Fences are used to serialize access to objects. These Sync Fences are exposed via file descriptors which the underlying driver can use to implement synchronization. The current Intel DRM driver passes a shared page containing a Linux Futex.

Opening the DRM Device Ideally, the DRM application would be able to just open the graphics device and start drawing, sending the resulting buffers to the X server for display. There s work going on to make this possible, but the current situation has the X server in charge of blessing the file descriptors used by DRM clients. DRI2 does this by having the DRM client fetch a magic cookie from the kernel and pass that to the X server. The cookie is then passed to the kernel which matches it up with the DRM client and turns on rendering access for that application. For DRI3, things are much simpler the DRM client asks the X server to pass back a file descriptor for the device. The X server opens the device, does the magic cookie dance all by itself (at least for now), and then passes the file descriptor back to the application.

 
    DRI3Open
    drawable: DRAWABLE
    driverType: DRI3DRIVER
    provider: PROVIDER
       
    nfd: CARD8
    driver: STRING
    device: FD
 
    Errors: Drawable, Value, Match
    This requests that the X server open the direct rendering
    device associated with drawable, driverType and RandR
    provider. The provider must support SourceOutput or SourceOffload.
    The direct rendering library used to implement the specified
    'driverType' is returned in 'driver'. The file
    descriptor for the device is returned in 'device'. 'nfd' will
    be set to one (this is strictly a convenience for XCB which
    otherwise would need request-specific information about how
    many file descriptors were associated with this reply).

Sharing Kernel Pixel Buffers An explicit non-goal of DRI3 is support for sharing buffers that don t map directly to regular X pixmaps. So, GL ancillary buffers like depth and stencil just don t apply here. The shared buffers in DRI3 are regular X pixmaps in the X server. This provides a few obvious benefits over the DRI2 scheme: In the kernel, the buffers are referenced by DMA-BUF handles, which provides a nice driver-independent mechanism.

Lifetimes are easily managed. Without being associated with a separate drawable, it s easy to know when to free the Pixmap.
Regular X requests apply directly. For instance, copying between buffers can use the core CopyArea request.

To create back- and fake-front- buffers for Windows, the application creates a kernel buffer, associates a DMA-BUF file descriptor with that and then sends the fd to the X server with a pixmap ID to create the associated pixmap. Doing it in this direction avoids a round trip.

 
    DRI3PixmapFromBuffer
    pixmap: PIXMAP
    drawable: DRAWABLE
    size: CARD32
    width, height, stride: CARD16
    depth, bpp: CARD8
    buffer: FD
 
    Errors: Alloc, Drawable, IDChoice, Value, Match
    Creates a pixmap for the direct rendering object associated
    with 'buffer'. Changes to pixmap will be visible in that
    direct rendered object and changes to the direct rendered
    object will be visible in the pixmap.
    'size' specifies the total size of the buffer bytes. 'width',
    'height' describe the geometry (in pixels) of the underlying
    buffer. 'stride' specifies the number of bytes per scanline in
    the buffer. The pixels within the buffer may not be arranged
    in a simple linear fashion, but 'size' will be at least
    'height' * 'stride'.
    Precisely how any additional information about the buffer is
    shared is outside the scope of this extension.
    If buffer cannot be used with the screen associated with
    drawable, a Match error is returned.
    If depth or bpp are not supported by the screen, a Value error
    is returned.

To provide for texture-from-pixmap, the application takes the pixmap ID and passes that to the X server which returns the a file descriptor for a DMA-BUF which is associated with the underlying kernel buffer.

 
    DRI3BufferFromPixmap
    pixmap: PIXMAP
       
    depth: CARD8
    size: CARD32
    width, height, stride: CARD16
    depth, bpp: CARD8
    buffer: FD
 
    Errors: Pixmap, Match
    Pass back a direct rendering object associated with
    pixmap. Changes to pixmap will be visible in that
    direct rendered object and changes to the direct rendered
    object will be visible in the pixmap.
    'size' specifies the total size of the buffer bytes. 'width',
    'height' describe the geometry (in pixels) of the underlying
    buffer. 'stride' specifies the number of bytes per scanline in
    the buffer. The pixels within the buffer may not be arranged
    in a simple linear fashion, but 'size' will be at least
    'height' * 'stride'.
    Precisely how any additional information about the buffer is
    shared is outside the scope of this extension.
    If buffer cannot be used with the screen associated with
    drawable, a Match error is returned.

Tracking Window Size Changes When Eric Anholt and I first started discussing DRI3, we hoped to avoid needing to learn about the window size from the X server. The thought was that the union of all of the viewports specified by the application would form the bounds of the drawing area. When the window size changed, we expected the application would change the viewport. Alas, this simple plan isn t sufficient here a few GL functions are not limited to the viewport. So, we need to track the actual window size and monitor changes to it. DRI2 does this by delivering invalidate events to the application whenever the current buffer isn t valid; the application discovers that this event has been delivered and goes to as the X server for the new buffers. There are a couple of problems with this approach:

Any outstanding DRM rendering requests will still draw to the old buffers.
The Invalidate events must be captured before the application sees the related ConfigureNotify event so that the GL library can react appropriately.

The first problem is pretty intractable within DRI2 the application has no way of knowing whether a frame that it has drawn was delivered to the correct buffer as the underlying buffer object can change at any time. DRI3 fixes this by having the application in control of buffer management; it can easily copy data from the previous back buffer to the new back buffer synchronized to its own direct rendering. The second problem was solved in DRI2 by using the existing Xlib event hooks; the GL library directly implements the Xlib side of the DRI2 extension and captures the InvalidateBuffers events within that code, delivering those to the driver code. The problem with this solution is that Xlib holds the Display structure mutex across this whole mess, and Mesa must be very careful not to make any Xlib calls during the invalidate call. For DRI3, I considered placing the geometry data in a shared memory buffer, but my future plans for the Present extension led me to want an X event instead (more about the Present extension in a future posting). An X ConfigureNotify event is sufficient for the current requirements to track window sizes accurately. However, there s no easy way for the GL library to ensure that ConfigureNotify events will be delivered to the application other application code may (and probably will) adjust the window event mask for its own uses. I considered adding the necessary event mask tracking code within XCB, but again, knowing that the Present extension would probably need additional information anyhow, decided to create a new event instead. Using an event requires that XCB provide some mechanism to capture those events, keep them from the regular X event stream, and deliver them to the GL library. A further requirement is that the GL library be absolutely assured of receiving notification about these events before the regular event processing within the application will see a core ConfigureNotify event. The method I came up with for XCB is fairly specific to my requirements. The events are always XGE events, and are tagged with a special event context ID , an XID allocated for this purpose. The combination of the extension op-code, the event type and this event context ID are used to split off these events to custom event queues using the following APIs:

/**
 * @brief Listen for a special event
 */
xcb_special_event_t *xcb_register_for_special_event(xcb_connection_t *c,
                                                    uint8_t extension,
                                                    uint16_t evtype,
                                                    uint32_t eid,
                                                    uint32_t *stamp);

This creates a special event queue which will contain only events matching the specified extension/type/event-id triplet.

/**
 * @brief Returns the next event from a special queue
 */
xcb_generic_event_t *xcb_check_for_special_event(xcb_connection_t *c,
                                                 xcb_special_event_t *se);

This pulls an event from a special event queue. These events will not appear in the regular X event queue and so applications will never see them. There s one more piece of magic here the stamp value passed to xcbregisterforspecialevent. This pointer refers to a location in memory which will be incremented every time an event is placed in the special event queue. The application can cheaply monitor this memory location for changes and known when to check the queue for events. Within GL, the value used is the existing dri2 stamp value. That is checked at the top of the rendering operation; if it has changed, the drawing buffers will be re-acquired. Part of the buffer acquisition process is a check for special events related to the window. For now, I ve placed these events in the DRI3 extension. However, they will move to the Present extension once that is working.

 
    DRI3SelectInput
    eventContext: DRI3EVENTID
    window: WINDOW
    eventMask: SETofDRI3EVENT
 
    Errors: Window, Value, Match, IDchoice
    Selects the set of DRI3 events to be delivered for the
    specified window and event context. DRI3SelectInput can
    create, modify or delete event contexts. An event context is
    associated with a specific window; using an existing event
    context with a different window generates a Match error.
    If eventContext specifies an existing event context, then if
    eventMask is empty, DRI3SelectInput deletes the specified
    context, otherwise the specified event context is changed to
    select a different set of events.
    If eventContext is an unused XID, then if eventMask is empty
    no operation is performed. Otherwise, a new event context is
    created selecting the specified events.

The events themselves look a lot like a configure notify event:

 
    DRI3ConfigureNotify
    type: CARD8         XGE event type (35)
    extension: CARD8        DRI3 extension request number
    length: CARD16          2
    evtype: CARD16          DRI3_ConfigureNotify
    eventID: DRI3EVENTID
    window: WINDOW
    x: INT16
    y: INT16
    width: CARD16
    height: CARD16
    off_x: INT16
    off_y: INT16
    pixmap_width: CARD16
    pixmap_height: CARD16
    pixmap_flags: CARD32
 
    'x' and 'y' are the parent-relative location of 'window'.

Note that there are a couple of odd additional fields offx, offy, pixmapwidth, pixmapheight and pixmap_flags are all place-holders for what I expect to end up in the Present extension. For now, in DRI3, they should be ignored. Synchronization The DRM application needs to know when various X requests related to its buffers have finished. In particular, when performing a buffer swap, the client wants to know when that completes, and be able to block until it has. DRI2 does this by having the application make a synchronous request from the X server to get the names of the new back buffer for drawing the next frame. This has two problems:

The synchronous round trip to the X server isn t free. Other running applications may cause fairly arbitrary delays in getting the reply back from the X server.
Synchronizing with the X server doesn t ensure that GPU operations are necessarily serialized between the application and the X server.

What we want is a serialization guarantee between the X server and the DRM application that operates at the GPU level. I ve written a couple of times (dri3k first steps and Shared Memory Fences) about using X Sync extension Fences (created by James Jones and Aaron Plattner) for this synchronization and wanted to get a bit more specific here. With the X server, a Sync extension Fence is essentially driver-specific, allowing the hardware design to control how the actual synchronization is performed. DRI3 creates a way to share the underlying operating system object by passing a file descriptor from application to the X server which somehow references that device object. Both sides of the protocol need to tacitly agree on what it means.

 
    DRI3FenceFromFD
    drawable: DRAWABLE
    fence: FENCE
    initially-triggered: BOOL
    fd: FD
 
    Errors: IDchoice, Drawable
    Creates a Sync extension Fence that provides the regular Sync
    extension semantics along with a file descriptor that provides
    a device-specific mechanism to manipulate the fence directly.
    Details about the mechanism used with this file descriptor are
    outside the scope of the DRI3 extension.

For the current GEM kernel interface, because all GPU access is serialized at the kernel API, it s sufficient to serialize access to the kernel itself to ensure operations are serialized on the GPU. So, for GEM, I m using a shared memory futex for the DRI3 synchronization primitive. That does not mean that all GPUs will share this same mechanism. Eliminate the kernel serialization guarantee and some more GPU-centric design will be required. What about Swap Buffers? None of the above stuff actually gets bits onto the screen. For now, the GL implementation is simply taking the X pixmap and copying it to the window at SwapBuffers time. This is sufficient to run applications, but doesn t provide for all of the fancy swap options, like limiting to frame rate or optimizing full-screen swaps. I ve decided to relegate all of that functionality to the as-yet-unspecified Present extension. Because the whole goal of DRI3 was to get direct rendered application contents into X pixmaps, the Present extension will operate on those X objects directly. This means it will also be usable with non-DRM applications that use simple X pixmap based double buffering, a class which includes most existing non-GL based Gtk+ and Qt applications. So, I get to reduce the size of the DRI3 extension while providing additional functionality for non direct-rendered applications. Current Status As I said above, all of the above functionality is running on my systems and has booted both complete KDE and Gnome sessions. There have been some recent DMA-BUF related fixes in the kernel, so you ll need to run the latest 3.9.x stable release or a 3.10 release candidate. Here s references to all of the appropriate git repositories: DRI3 protocol and spec:

git://people.freedesktop.org/~keithp/dri3proto      master

XCB protocol

git://people.freedesktop.org/~keithp/xcb/proto.git  dri3

XCB library

git://people.freedesktop.org/~keithp/xcb/libxcb.git dri3

xshmfence library:

git://people.freedesktop.org/~keithp/libxshmfence.git   master

X server:

git://people.freedesktop.org/~keithp/xserver.git    dri3

Mesa:

git://people.freedesktop.org/~keithp/mesa.git       dri3

Next Steps Now it s time to go write the Present extension and get that working. I ll start coding and should have another posting here next week.

22 May 2013

Keith Packard: Altos1.2.1

AltOS 1.2.1 TeleBT support, bug fixes and new AltosUI features Bdale and I are pleased to announce the release of AltOS version 1.2.1. AltOS is the core of the software for all of the Altus Metrum products. It consists of cc1111-based micro-controller firmware and Java-based ground station software. The biggest new feature for AltOS is the addition of support for TeleBT, our ground station designed to operate with Android phones and tablets. In addition, there s a change in the TeleDongle radio configuration that should improve range, some other minor bug fixes and new features in AltosUI AltOS Firmware Features and fixes There are bug fixes in both ground station and flight software, so you should plan on re-flashing both units at some point. However, there aren t any incompatible changes, so you don t have to do it all at once. New features:

TeleBT support.
Improved radio sensitivity. The TeleDongle receiver parameters have been tweaked to provide better reception.
TeleMini now completely resets all radio parameters in recovery mode (with the two outer debug pins connected) 434.550MHz, N0CALL, factory radio cal.

Bug fixes:

USB device fixes. This improves operation with Windows, avoiding hangs and errors in many cases.
Correct the Kalman filter error covariance matrix; the old parameters were built assuming continuous measurements.

AltosUI Easier to use AltosUI has also seen quite a bit of work for the 1.2.1 release. It s got several fun new features and a few bug fixes. New Graph UI features:

Show tool-tips with the value near the cursor.
Make the set of displayed values configurable. Add all of the available data values just in case you want to see them.
Added a Map tab showing the ground track of the whole flight.
The flight summary tab now includes the final GPS position. This lets you figure out where your rocket landed without replaying the whole flight.

Other new AltosUI features:

TeleBT support, including Bluetooth connections (Linux-only, at present).
Shows the callsign in the Monitor Idle and other command-mode windows so that you can tell what callsign is being used.
Show the block number when downloading flight data. This lets you see something happen even for longer flights.
Make the initial position of the AltosUI configurable so that you can position it out of the way of the rest of you desktop.
Distribute Mac OS X in .dmg format (Mac OS Disk Image); this means you don t need to explicitly unpack the bits.

Bug fixes:

Deal with broken networking while downloading map tiles. Tiles are now always downloaded asynchronously so that the UI doesn t freeze when the network is slow.

26 April 2013

Keith Packard: Shared Memory Fences

Shared Memory Fences In our last adventure, dri3k first steps, one of the future work items was to deal with synchronization between the direct rendering application and the X server. DRI2 handles this by performing a round trip each time the application starts using a buffer that was being used by the X server. As DRI3 manages buffer allocation within the application, there s really no reason to talk to the server, so this implicit serialization point just isn t available to us. As I mentioned last time, James Jones and Aaron Plattner added an explicit GPU serialization system to the Sync extension. These SyncFences serializing rendering between two X clients, but within the server there are hooks provided for the driver to use hardware-specific serialization primitives. The existing Linux DRM interfaces queue rendering to the GPU in the order requests are made to the kernel, so we don t need the ability to serialize within the GPU, we just need to serialize requests to the kernel. Simple CPU-based serialization gating access to the GPU will suffice here, at least for the current set of drivers. GPU access which is not mediated by the kernel will presumably require serialization that involves the GPU itself. We ll leave that for a future adventure though; the goal today is to build something that works with the current Linux DRM interfaces. SyncFence Semantics The semantics required by SyncFences is for multiple clients to block on a fence which a single client then triggers. All of the blocked clients start executing requests immediately after the trigger fires. There are four basic operations on SyncFences:

Trigger. Mark the fence as ready and wake up all waiting clients
Await. Block until the fence is ready.
Query. Retrieve the current state of the fence.
Reset. Unset the fence; future Await requests will block.

SyncFences are the same as Events as provided by Python and other systems. Of course all of the names have been changed to keep things interesting. I ll call them Fences here, to be consistent with the current X usage. Using Pthread Primitives One fact about pthreads that I recently learned is that the synchronization primitives (mutexes, barriers and semaphores) are actually supposed to work across process boundaries, if those objects are in shared memory mapped by each process. That seemed like a great simplification for this project; allocate a page of shared memory, map into the X server and direct rendering application and use the existing pthreads APIs. Alas, the pthread objects are architecture specific. I m pretty sure that when that spec was written, no-one ever thought of running multiple architectures within the same memory space. I went and looked at the code to check, and found that each of these objects has a different size and structure on x86 and x86_64 architectures. That makes it pretty hard to use this API within X as we often have both 32- and 64- bit applications talking to the same (presumably 64-bit) X server. As a last resort, I read through a bunch of articles on using futexes directly within applications and decided that it was probably possible to implement what I needed in an architecture-independent fashion. Futexes Linux Futexes live in this strange limbo of being a not-quite-public kernel interface. Glibc uses them internally to implement locking primitives, but it doesn t export any direct interface to the system call. Certainly they re easy to use incorrectly, but it s unusual in the Linux space to have our fundamental tools locked away for our own safety . Fortunately, we can still get at futexes by creating our own syscall wrappers.

static inline long sys_futex(void *addr1, int op, int val1,
                 struct timespec *timeout, void *addr2, int val3)
 
    return syscall(SYS_futex, addr1, op, val1, timeout, addr2, val3);

For this little exercise, I created two simple wrappers, one to block on a futex:

static inline int futex_wait(int32_t *addr, int32_t value)  
    return sys_futex(addr, FUTEX_WAIT, value, NULL, NULL, 0);

and one to wake up all futex waiters:

static inline int futex_wake(int32_t *addr)  
    return sys_futex(addr, FUTEX_WAKE, MAXINT, NULL, NULL, 0);

Atomic Memory Operations I need atomic memory operations to keep separate cores from seeing different values of the fence value, GCC defines a few such primitives and I picked _syncboolcompareandswap and _syncvalcompareandswap. I also need fetch and store operations that the compiler won t shuffle around:

#define barrier() __asm__ __volatile__("": : :"memory")
static inline void atomic_store(int32_t *f, int32_t v)
 
    barrier();
    *f = v;
    barrier();
 
static inline int32_t atomic_fetch(int32_t *a)
 
    int32_t v;
    barrier();
    v = *a;
    barrier();
    return v;

If your machine doesn t make these two operations atomic, then you would redefine these as needed. Futex-based Fences These wake-all semantics of Fences greatly simplify reasoning about the operation as there s no need to ensure that only a single thread runs past Await, the only requirement is that no threads pass the Await operation until the fence is triggered. A Fence is defined by a single 32-bit integer which can take one of three values:

0 - The fence is not triggered, and there are no waiters.
1 - The fence is triggered (there can be no waiters at this point).
-1 - The fence is not triggered, and there are waiters (one or more).

With those, I built the fence operations as follows. Here s Await:

int fence_await(int32_t *f)
 
    while (__sync_val_compare_and_swap(f, 0, -1) != 1)  
        if (futex_wait(f, -1))  
            if (errno != EWOULDBLOCK)
                return -1;
         
     
    return 0;

The basic requirement that the thread not run until the fence is triggered is met by fetching the current value of the fence and comparing it with 1. Until it is signaled, that comparison will return false. The compareandswap operation makes sure the fence is -1 before the thread calls futex_wait, either it was already -1 in the case where there were other waiters, or it was 0 before and is now -1 in the case where there were no waiters before. This needs to be an atomic operation so that the fence value will be seen as -1 by the trigger operation if there are any threads in the syscall. The futex_wait call will return once the value is no longer -1, it also ensures that the thread won t block if the trigger occurs between the swap and the syscall. Here s the Trigger function:

int fence_trigger(int32_t *f)
 
    if (__sync_val_compare_and_swap(f, 0, 1) == -1)  
        atomic_store(f, 1);
        if (futex_wake(f) < 0)
            return -1;
     
    return 0;

The atomic compareandswap operation will make sure that no Await thread swaps the 0 for a -1 while the trigger is changing the value from 0 to 1; either the Await switches from 0 to -1 or the Trigger switches from 0 to 1. If the value before the compareandswap was -1, then there may be threads waiting on the Fence. An atomic store, constructed with two memory barriers and a regular store operation, to mark the Fence triggered is followed by the futex_wake call to unblock all Awaiting threads. The Query function is just an atomic fetch:

int fence_query(int32_t *f)
 
    return atomic_fetch(f) == 1;

Reset requires a compareandswap so that it doesn t disturb things if the fence has already been reset and there are threads waiting on it:

void fence_reset(int32_t *f)
 
    __sync_bool_compare_and_swap(f, 1, 0);

A Request for Review Ok, so we ve all tried to create synchronization primitives only to find that our obvious implementations were full of holes. I d love to hear from you if you ve identified any problems in the above code, or if you can figure out how to use the existing glibc primitives for this operation.

12 April 2013

Keith Packard: dri3k first steps

DRI3K First Steps Here s an update on DRI3000. I ll start by describing what I ve managed to get working and then summarize discussions that happened on the xorg-devel mailing list. Private Back Buffers One of the big goals for DRI3000 is to finish the job of moving buffer management out of the X server and into applications. The only thing still allocated by DRI2 in the X server are back buffers; everything else moved to the client side. Yes, I know, this breaks the GLX requirement for sharing buffers between applications, but we just don t care anymore. As a quick hack, I figured out how to do this with DRI2 today allocate our back buffers separately by creating X pixmaps for them, and then using the existing DRI2GetBuffersWithFormat request to get a GEM handle for them. Of course, now that all I ve got is a pixmap, I can t use the existing DRI2 swap buffer support, so for now I m just using CopyArea to get stuff on the screen. But, that works fine, as long as you don t care about synchronization. Handling Window Resize The biggest pain in DRI2 has been dealing with window resize. When the window resizes in the X server, a new back buffer is allocated and the old one discarded. An event is delivered to invalidate the old back buffer, but anything done between the time the back buffer is discarded and when the application responds to the event is lost. You can easily see this with any GL application today resize the window and you ll see occasional black frames. By allocating the back buffer in the application, the application handles the resize within GL; at some point in the rendering process the resize is discovered, and GL creates a new buffer, copies the existing data over, and continues rendering. So, the rendered data are never lost, and every frame gets displayed on the screen (although, perhaps at the wrong size). The puzzle here was how to tell that the window was resized. Ideally, we d have the application tell us when it received the X configure notify event and was drawing the frame at the new size. We thought of a cute hack that might do this; track GL calls to change the viewport and make sure the back buffer could hold the viewport contents. In theory, the application would receive the X configure notify event, change the viewport and render at the new size. Tracking the viewport settings for an entire frame and constructing their bounding box should describe the size of the window; at least it should describe the intended size of the window. There s at least one serious problem with this plan applications may well call glClear before calling glViewport, and as glClear does not use the current viewport, instead clearing the whole window, we couldn t use the viewport as an indication of the current window size. However, what this exercise did lead us to realize was that we don t care what size the window actually is, we only care what size the application thinks it is. More accurately, the GL library just needs to be aware of any window configuration changes before the application, so that it will construct a buffer that is not older than the application knowledge of the window size. I came up with two possible mechanisms here; the first was to construct a shared memory block between application and X server where the X server would store window configuration changes and signal the application by incrementing a sequence number in the shared page; the GL library would simply look at the sequence number and reallocate buffers when it changed. The problem with the shared memory plan was that it wouldn t work across the network, and we have a future project in mind to replace GLX indirect rendering with local direct rendering and PutImage which still needs accurate window size tracking. More about that project in a future post though X Events to the Rescue So, I decided to just have the X server send me events when the window size changed. I could simply use the existing X configure notify events, but that would require a huge infrastructure change in the application so that my GL library could get those events and have the application also see them. Not knowing what the application is up to, we d have to track every ChangeWindowAttributes call and make sure the event_mask included the right bits. Ick. Fortunately, there s another reason to use a new event we need more information than is provided in the ConfigureNotify event; as you know, the Swap extension wants to have applications draw their content within a larger buffer that can have the window decorations placed around it to avoid a copy from back buffer to window buffer. So, our new ConfigureNotify event would also contain that information. Making sure that ConfigureNotify event is delivered before the core ConfigureNotify event ensures that the GL library should always be able to know about window size changes before the application. Splitting the XCB Event Stream Ok, so I ve got these new events coming from the X server. I don t want the application to have to receive them and hand them down to the GL library; that would mean changing every application on the planet, something which doesn t seem very likely at all. Xlib does this kind of thing by allowing applications to stick themselves into the middle of the event processing code with a callback to filter out the events they re interested in before they hit the main event queue. That s how DRI2 captures Invalidate events, and it works , but using callbacks from the middle of the X event processing code creates all kinds of locking nightmares. As discussed above, I don t care when GL sees the configure events, as long as it gets them before the application finds about about the window size change. So, we don t need to synchronously handle these events, we just need to be able to know they ve arrived and then handle them on the next call to a GL drawing function. What I ve created as a prototype is the ability to identify specific events and place them in a separate event queue, and when events are placed in that event queue, to bump a sequence number so that the application can quickly identify that there s something to process. Making the Event Mask Per-API Instead of Per-Client The problem described above about using the core ConfigureNotify events made me think about how to manage multiple APIs all wanting to track window configuration. For core events, the selection of which events to receive is all based on the client; each client has a single event mask, and each client receives one copy of each event. Monolithic applications work fine with this model; there s one place in the application selecting for events and one place processing them. However, modern applications end up using different APIs for 3D, 2D and media. Getting those libraries to cooperate and use a common API for event management seems pretty intractable. Making the X server treat each API as a separate entity seemed a whole lot easier; if two APIs want events, just have them register separately and deliver two events flagged for the separate APIs. So, the new DRI3 configure notify events are created with their own XID to identify the client-side owner of the event. Within the X server, this required a tiny change; we already needed to allocate an XID for each event selection so that it could be automatically cleaned up when the client exited, so the only change was to use the one provided by the client instead of allocating one in the server. On the wire, the event includes this new XID so that the library can use it to sort out which event queue to stick the event in using the new XCB event stream splitting code. Current Status The above section describes the work that I ve got running; with it, I can run GL applications and have them correctly track window size changes without losing a frame. It s all available on the dri3 branches of my various repositories for xcb proto, libxcb, dri3proto and the X server. Future Directions The first obvious change needed is to move the configuration events from the DRI3 extension to the as-yet-unspecified new Swap extension (which I may rename as Present , as in please present this pixmap in this window ). That s because they aren t related to direct rendering, but rather to tracking window sizes for off-screen rendering, either direct, indirect or even with the CPU to memory. DRI3 and Fences Right now, I m not synchronizing the direct rendering with the CopyArea call; that means the X server will end up with essentially random contents as the application may be mid-way through the next frame before it processes the CopyArea. A simple XSync call would suffice to fix that, but I want a more efficient way of doing this. With the current Linux DRI kernel APIs, it is sufficient to serialize calls that post rendering requests to the kernel to ensure that the rendering requests are themselves serialized. So, all I need to do is have the application wait until the X server has sent the CopyArea request down to the kernel. I could do that by having the X server send me an X event, but I think there s a better way that will extend to systems that don t offer the kernel serialization guarantee. James Jones and Aaron Plattner put together a proposal to add Fences to the X Sync extension. In the X world, those offer a method to serialize rendering between two X applications, but of course the real goal is to expose those fences to GL applications through the various GL sync extensions (including GLARBsync and GLNVfence). With the current Linux DRI implementation, I think it would be pretty easy to implement these fences using pthread semaphores in a block of memory shared between the server and application. That would be DRI-specific; other direct rendering interfaces would use alternate means to share the fences between X server and application. Swap/Present The Second Extension By simply using CopyArea for my application presentation step, I think I ve neatly split this problem into manageable pieces. Once I ve got the DRI3 piece working, I ll move on to fixing the presentation issue. By making that depend solely on existing core Pixmap objects as the source of data to present, I can develop that without any reference to DRI. This will make the extension useful to existing X applications that currently have only CopyArea for this operation. Presentation of application contents occurs in two phases; the first is to identify which objects are involved in the presentation. The second is to perform the presentation operation, either using CopyArea, or by swapping pages or the entire frame buffer. For offscreen objects, these can occur at the same time. For onscreen, the presentation will likely be synchronized with the scanout engine. The second form will mean that the Fences that mark when the presentation has occurred will need to signaled only once the operation completes. A CopyArea operation means that the source pixmap is ready immediately after the Copy has completed. Doing the presentation by using the source pixmap as the new front buffer means that the source pixmap doesn t become ready until after the next swap completes. What I don t know now is whether we ll need to report up-front whether the presentation will involve a copy or a swap. At this point, I don t think so the application will need two back buffers in all cases to avoid blocking between the presentation request and the presentation execution. Yes, it could use a fence for this, but that still sticks a bubble in the 3D hardware where it s blocked waiting for vblank instead of starting on the next frame immediately. Plan of Attack Right now, I m working on finishing up the DRI3 piece:

Replace the DRI2 buffer allocation kludge with actual local buffer allocation, mapping them into pixmaps using FD passing.
Replace the DRI2 authentication scheme with having the X server open the DRI object, preparing it for rendering and passing it back to the application.
Working on the XCB pieces to get the split event-queue stuff landed upstream.
Implementing the Fencing stuff to correctly serialize access to the pixmap.

The first three seem fairly straight forward. The fencing stuff will involve working with James and Aaron to integrate their XSync changes into the server. After that, I ll start working on the presentation piece. Foremost there is figuring out the right name for this new extension; I started with the name Swap as that s the GL call it implements. However, Swap is quite misleading as to the actual functionality; a name more like Present might provide a better indication of what it actually does. Of course, Present is both a verb and a noun, with very different connotations. Suggestions on this most complicated part of the project are welcome!

6 March 2013

Keith Packard: composite-swap

Composite and Swap Getting it Right Where the author tries to make sure DRI3000 is going to do what we want now and in the future DRI3000 The basic DRI3000 plan seems pretty straightforward:

Have applications allocate buffers full of new window contents, attach pixmap IDs to those buffers and pass them to the X server to get them onto the screen.
Provide a mechanism to let applications know when those pixmaps are idle so that they can reuse them instead of creating new ones for every frame.
Finally, allow the actual presentation of the contents to be scheduled for a suitable time in the future, generally synchronized with the monitor. Let the client know when this has happened in case they want to synchronize themselves to vblank.

The DRI3 extension provides a way to associate pixmap IDs and buffers, and given the MIT-SHM prototype I ve already implemented, I think we can safely mark this part as demonstrably implementable. That leaves us with a smaller problem, that of taking pixmap contents and presenting them on the screen at a suitable time and telling applications about the progress of that activity. In the absence of compositing, I m pretty sure the initial Swap extension design would do this job just fine, and should resolve some of the known DRI2 limitations related to buffer management. And, I think that goal is sufficient motivation to go and implement that. However, I wanted to write up some further ideas to see if the DRI3000 plan can be made to do precisely what we want in a composited world. The Composited Goal To make sure we re all on the same page, here s what I expect from the Swap extension in a composited world:

Application calls Swap with new window pixmap
Compositor hears about the new pixmap and uses that to construct a new screen pixmap
Compositor calls Swap with new screen pixmap
Vertical retrace happens, executing the pending swap operation
Compositor hears about the swap completion for the screen
Application hears about the swap completion for its window

In particular, applications should not hear that their swap operations are complete until the contents appear on the screen. This allows for applications to throttle themselves to the screen rate, either doing double or triple buffering as they choose. I didn t add steps here indicating buffers going idle or being allocated, because I think that should all happen behind the scenes from the application s perspective. Many applications won t care about the swap completion notification either, but some will and so that needs to be visible. Redirected Swaps? Owen Taylor suggested that one way of getting the compositor involved would be to have it somehow redirect Swap operations, much like we do with window management operations today. I think that idea may be a good direction to try:

Application calls Swap with new window pixmap
Swap is redirected to compositor, passing along the new window pixmap
Compositor constructs a new screen pixmap using the new window pixmap
Compositor calls Swap on the screen and the window, passing the new screen pixmap and the new window pixmap. When the screen update occurs, the screen and the window both receive swap completion events.

This has the added benefit that the X server knows when the compositor is expecting window pixmaps to change like this the compositor has to explicitly request Swap redirection. Window Pixmap Names and GEM Buffer Handles One issue that swapping window pixmaps around like this brings up is how to manage existing names for the window pixmap. Right now, applications expect that window pixmaps will only change when the window is resized. If the Swap extension is going to actually replace the window pixmap when running with a suitable compositor, then we need to figure out what the old names will reference. Are there non-compositor applications using NameWindowPixmap that matter to us? How about non-compositor applications using TextureFromPixmap to get a GEM handle for a window pixmap? For now, I m very tempted to just break stuff and see who complains, but knowing what we re breaking might be nice beforehand. Idling Pixmaps When an application is done drawing to a window pixmap and has passed it off to the X server for presentation, we d like for that pixmap to be automatically marked as discardable as soon as possible. This way, when memory is tight, the kernel can come steal those pages for something critical. Of course, applications may not want to let the server mark the pixmap as idle after being used, so a flag to the Swap call would be needed. Ideally, the pixmap would become idle immediately after the pixmap contents have been extracted. In the absence of a compositor, that would probably be when the Swap operation completes. With a compositor running, we d need explicit instruction from the compositor telling us that the window pixmap was now idle :

 
    SwapIdle
    drawable: Drawable
    pixmap: Pixmap

Furthermore, the application needs to know that the pixmap is in fact idle. I think that we ll need a synchronous X request that marks a buffer as no longer idle and have that return whether the buffer was discarded while idle. It doesn t seem sufficient to use events here as the application will need to completely reconstruct the pixmap contents in this case. This reply could also contain information about precisely what contents the pixmap does contain.

 
    SwapReuse
    drawable: Drawable
    pixmap: Pixmap
       
    valid: BOOL
    swap-hi: CARD32
    swap-lo: CARD32

Pixmap Lifetimes and Triple Buffered Applications If we redirect the Swap operation and send the original application window pixmap ID to the compositor, what happens when the application frees that pixmap before the compositor gets around to using the contents? Surely the Compositor must handle such cases, and not just crash. However, I m fine with requiring that the application not free the pixmap until told by the compositor.

28 February 2013

Keith Packard: x-on-resize

x-on-resize: a simple display configuration daemon I like things to be automated as much as possible, and having abandoned Gnome to their own fate and switched to xfce, I missed the automatic display reconfiguration stuff. I decided to write something as simple as possible that did just what I needed. I did this a few months ago, and when Carl Worth asked what I was using, I decided to pack it up and make it available. Automatic configuration with a shell script I ve had a shell script around that I used to bind to a key press which I d hit when I plugged or unplugged a monitor. So, all I really need to do is get this script run when something happens. The missing tool here was something to wait for a change to happen and automatically invoke the script I d already written. Resize vs Configure The first version of x-on-resize just listened for ConfigureNotify events on the root window. These get sent every time anything happens with the screen configuration, from hot-plug to notification when someone runs xrandr. That was as simple as possible; the application was a few lines of code to select for ConfigureNotify events, and invoke a program provided on the command line. However, it was a bit too simple as it would also respond to manual invocations of xrandr and call the script then as well. So, as long as I was content to accept whatever the script did, things were fine. And, with a laptop that had a DisplayPort connector for my external desktop monitor, and a separate VGA connector for projectors at conferences, the script always did something useful. Then I got this silly laptop that has only DisplayPort, and for which a dongle is required to get to VGA for projectors. I probably could write something fancy to figure out the difference between a desktop DisplayPort monitor and DisplayPort to VGA dongle, but I decided that solving the simpler problem of only invoking the script on actual hotplug events would be better. So, I left the current invoke-on-resize behavior intact and added new code that watched the list of available outputs and invoked a new config script when that set changed. The final program, x-on-resize, is available via git at

git://people.freedesktop.org/~keithp/x-on-resize

I even wrote a manual page. Enjoy!

20 February 2013

Keith Packard: DRI3000

DRI3000 Even Better Direct Rendering This all started with the presentation that Eric Anholt and I did at the 2012 X developers conference, and subsequently wrote about in my DRI-Next posting. That discussion sketched out the goals of changing the existing DRI2-based direct rendering infrastructure. Last month, I gave a more detailed presentation at Linux.conf.au 2013 (the best free software conference in the world). That presentation was recorded, so you can watch it online. Or, you can read Nathan Willis summary at lwn.net. That presentation contained a lot more details about the specific techniques that will be used to implement the new system, in particular it included some initial indications of what kind of performance benefits the overall system might be able to produce. I sat down today and wrote down an initial protocol definition for two new extensions (because two extensions are always better than one). Together, these are designed to provide complete support for direct rendering APIs like OpenGL and offer a better alternative to DRI2. The DRI3 extension Dave Airlie and Eric Anholt refused to let me call either actual extension DRI3000, so the new direct rendering extension is called DRI3. It uses POSIX file descriptor passing to share kernel objects between the X server and the application. DRI3 is a very small extension in three requests:

Open. Returns a file descriptor for a direct rendering device along with the name of the driver for a particular API (OpenGL, Video, etc).
PixmapFromBuffer. Takes a kernel buffer object (Linux uses DMA-BUF) and creates a pixmap that references it. Any place a Pixmap can be used in the X protocol, you can now talk about a DMA-BUF object. This allows an application to do direct rendering, and then pass a reference to those results directly to the X server.
BufferFromPixmap. This takes an existing pixmap and returns a file descriptor for the underlying kernel buffer object. This is needed for the GL Texture from Pixmap extension.

For OpenGL, the plan is to create all of the buffer objects on the client side, then pass the back buffer to the X server for display on the screen. By creating pixmaps, we avoid needing new object types in the X server and can use existing X apis that take pixmaps for these objects. The Swap extension Once you ve got direct rendered content in a Pixmap, you ll want to display it on the screen. You could simply use CopyArea from the pixmap to a window, but that isn t synchronzied to the vertical retrace signal. And, the semantics of the CopyArea operation precludes us from swapping the underlying buffers around, making it more expensive than strictly necessary. The Swap extension fills those needs. Because the DRI3 extension provides an X pixmap reference to the direct rendered content, the Swap extension doesn t need any new object types for its operation. Instead, it talks strictly about core X objects, using X pixmaps as the source of the new data and X drawables as the destination. The core of the Swap extension is one request SwapRegion. This request moves pixels from a pixmap to a drawable. It uses an X fixes Region object to specify the area of the destination being painted, and an offset within the source pixmap to align the two areas. A bunch of data are included in the reply from the SwapRegion request. First, you get a 64-bit sequence number identifying the swap itself. Then, you get a suggested geometry for the next source pixmap. Using the suggested geometry may result in performance improvements from the techniques described in the LCA talk above. The last bit of data included in the SwapRegion reply is a list of pixmaps which were used as source operands to earlier SwapRegion requests to the same drawable. Each pixmap is listed along with the 64-bit sequence number associated with an earlier SwapRegion operation which resulted in the contents which the pixmap now contains. Ok, so that sounds really confusing. Some examples are probably necessary.

If the SwapRegion operation was implemented by copying data out of the source pixmap into the destination drawable, then the idle swap count will be equal to the swap count from this SwapRegion operation.
If the SwapRegion operation was implemented by swapping the destination contents with the source contents, then the idle swap count will be equal to the previous swap count on the destination drawable.

I m hoping you ll be able to tell that in both cases, the idle swap count tries to name the swap sequence at which time the destination drawable contained the contents currently in the pixmap. Note that even if the SwapRegion is implemented as a Copy operation, the provided source pixmap may not be included in the idle list as the copy may be delayed to meet the synchronization requirements specfied by the client. Finally, if you want to throttle rendering based upon when frames appear on the screen, Swap offers an event that can be delivered to the drawable after the operation actually takes place. Because the Swap extension needs to supply all of the OpenGL SwapBuffers semantics (including a multiplicity of OpenGL extensions related to that), I ve stolen a handful of DRI2 requests to provide the necessary bits for that:

SwapGetMSC
SwapWaitMSC
SwapWaitSBC

These work just like the DRI2 requests of the same names. Current State of the Extensions Both of these extensions have an initial protocol specification written down and stored in git:

8 February 2013

Keith Packard: MicroPeakUSB

MicroPeak USB Interface now available. Altus Metrum is pleased to announce the immediate availability of the MicroPeak USB interface.

Connects the MicroPeak altimeter to USB.
Records 48 seconds of flight
Barometric pressure values recorded about 5 times per second.
Save, graph and export pressure, height, speed and acceleration data.
Supports Linux, Mac OS X and Windows. The software is available on the MicroPeak web page.
Learn more at http://www.altusmetrum.org/MicroPeakUSB
Buy these for $35 at the gag.com web store for Altus Metrum products. Or get one bundled with a MicroPeak and save $10.

MicroPeak USB Interface

MicroPeak and the MicroPeak USB Interface

MicroPeak is fun to use all by itself, providing a quick way to know how high your rocket has flown. But, for those people itching for more data, MicroPeakUSB offers a way to download raw flight data and analyze that on your computer. MicroPeakUSB doesn t require any changes to the MicroPeak hardware new MicroPeak firmware transmits the entire flight log through the on-board LED to a phototransistor on the MicroPeakUSB Interface and then to the USB port on your computer. Existing MicroPeak owners can contact us for a special deal on the MicroPeak USB interface and upgrading the MicroPeak firmware.

30 December 2012

Keith Packard: MicroPeakSerial

MicroPeak Serial Interface Flight Logging for MicroPeak MicroPeak was original designed as a simple peak-recording altimeter. It displays the maximum height of the last flight by blinking out numbers on the LED. Peak recording is fun and easy, but you need a log across apogee to check for unexpected bumps in baro data caused by ejection events. NAR also requires a flight log for altitude records. So, we wondered what could be done with the existing MicroPeak hardware to turn it into a flight logging altimeter. Logging the data The 8-bit ATtiny85 used in MicroPeak has 8kB of flash to store the executable code, but it also has 512B (yes, B as in bytes ) of eeprom storage for configuration data. Unlike the code flash, the little eeprom can be rewritten 100,000 times, so it should last for a lifetime of rocketry. The original MicroPeak firmware already used that to store the average ground pressure and minimum pressure (in Pascals) seen during flight; those are used to compute the maximum height that is shown on the LED. If we store just the two low-order bytes of the pressure data, we d have room left for 251 data points. That means capturing data at least every 32kPa, which is about 3km at sea level. 251 points isn t a whole lot of storage, but we really only need to capture the ascent and arc across apogee, which generally occurs within the first few seconds of flight. MicroPeak samples air pressure once every 96ms, if we record half of those samples, we ll have data every 192ms. 251 samples every 192ms captures 48 seconds of flight. A flight longer than that will just see the first 48 seconds. Of course, if apogee occurs after that limit, MicroPeak will still correctly record that value, it just won t have a continuous log. Downloading the data Having MicroPeak record data to the internal eeprom is pretty easy, but it s not a lot of use if you can t get the data into your computer. However, there aren t a whole lot of interfaces avaialble on MicroPeak. We ve only got:

The 6-pin AVR programming header. This is how we load firmware onto MicroPeak during manufacturing. It s not locked (of course), and the hardware supports reading and writing of flash, ram and eeprom.
The LED. We already use this to display the maximum height of the previous flight, can we blink it faster and then get the computer to read it out?

First implementation I changed the MicroPeak firmware to capture data to eeprom and made a test flight using my calibrated barometric chamber (a large syringe). I was able to read out the flight data using the AVR programming pins and got the flight logging code working that way. The plots I created looked great, but using an AVR programmer to read the data looked daunting for most people as it requires:

An AVR programmer. Adafruit sells the surprisingly useful USBtinyISP programmer. There are Windows drivers available for this, and you can get the necessary avrdude binaries from the usbtiny page above. The programmer itself comes in kit form, so you have to solder it together. There are other programmers available for a bit more than come pre-assembled, but all of them require that you wander around the net finding the necessary drivers and programming software.
A custom MicroPeak programming jig. We have these for sale in the Altus Metrum web store but, because they need special pogo pins and a pile of custom circuit boards, they re not cheap to make.

With the hardware running at least $120 retail, and requiring a pile of software installed from various places around the net, this approach didn t seem like a great way to let people easily capture flight data from their tiny altimeter. The Blinking LED The only other interface available is the MicroPeak LED. It s a nice LED, bright and orange and low power. But, it s still just a single LED. However, it seemed like it might be possible to have it blink out the data and create a device to watch the LED and connect that to a USB port. The simplest idea I had was to just blink out the data in asynchronous serial form; a start bit, 8 data bits and a stop bit. On the host side, I could use a regular FTDI FT230 USB to serial converter chip. Those even have a 3.3V regulator and can supply a bit of current to other components on the board, eliminating the need for an external power supply. To see the LED blink, I needed a photo-transistor that actually responds to the LED s wavelength. Most photo-transistors are designed to work with infrared light, which nicely makes the whole setup invisible. There are a few photo-transistors available which do respond in the visible range, and ROHM RPM-075PT actually has its peak sensitivity right in the same range as the LED. In between the photo-transistor and the FT230, I needed a detector circuit which would send a 1 when the light was present and a 0 when it wasn t. To me, that called for a simple comparator made from an op-amp. Set the voltage on the negative input to somewhere between light and dark and then drive the positive input from the photo-transistor; the output would swing from rail to rail. Bit-banging async The ATtiny85 has only a single serial port , which is used on MicroPeak to talk to the barometric sensor in SPI mode. So, sending data out the LED requires that it be bit-banged directly modulated with the CPU. I wanted the data transmission to go reasonably fast, so I picked a rate of 9600 baud as a target. That means sending one bit every 104 S. As the MicroPeak CPU is clocked at only 250kHz, that leaves only about 26 cycles per bit. I need all of the bits to go at exactly the same speed, so I pack the start bit, 8 data bits and stop bit into a single 16 bit value and then start sending. Of course, every pass around the loop would need to take exactly the same number of cycles, so I carefully avoided any conditional code. With that, 14 of the 26 cycles were required to just get the LED set to the right value. I padded the loop with 12 nops to make up the remaining time. At 26 cycles per bit, it s actually sending data at a bit over 9600 baud, but the FT230 doesn t seem to mind. A bit of output structure I was a bit worried about the serial converter seeing other light as random data, so I prefixed the data transmission with MP ; that made it easy to ignore anything before those two characters as probably noise. Next, I decided to checksum the whole transmission. A simple 16-bit CRC would catch most small errors; it s easy enough to re-try the operation if it fails after all. Finally, instead of sending the data in binary, I displayed each byte as two hex digits, and sent some newlines along to keep the line lengths short. This makes it easy to ship flight logs in email or whatever. Here s a sample of the final data format:

MP
dc880100fec000006800f56d8f63b059
73516447273fa93728301927d91b7712
730bbf0491fe88f7c5ee8ee896e3fadc
9dd9d3d502d1afcea2cbafc6b4c34ec1
bfbfcabf10c03dc05dc070c084c08fc0
9cc0abc0b9c0c1c0ccc0dcc020c152c4
71c9a6cf45d623db7de05ee758edd9f2
b4f9fd00aa074311631a9221c4291330
c035873b2943084bbb52695c0c67eb6b
d26ee5707472fb74a4781f7dee802b84
09860a87e786ad868a866e8659865186
4e8643863e863986368638862e862d86
2f862d86298628862a86268629862686
28862886258625862486
d925

Making the photo-transistor go fast enough The photo-transistor acts as one half of a voltage divider on the positive op-amp terminal, with a resistor making the other half. However, the photo-transistor acts a bit like a capacitor, so when I initially chose a fairly large value for the resistor, it actually took too long to switch between on and off the transistor would spend a bunch of time charging and discharging. I had to reduce the resistor to 1k for the circuit to work. Remaining hardware design I prototyped the circuit on a breadboard using a through-hole op-amp that my daughter designed into her ultrasonic guided robot and a prefabricated FTDI Friend board. I wanted to use the target photo-transistor, so I soldered a couple of short pieces of wire onto the SMT pads and stuck that into the breadboard. Once I had that working, I copied the schematic to gschem, designed a board and had three made at OSHPark for the phenomenal sum of $1.35. Aside from goofing up on the FT230 USB data pins (swapping D+ and D-), the board worked perfectly.

The final hardware design includes an LED connected to the output of the comparator that makes it easier to know when things are lined up correctly, otherwise it will be essentially the same. Host software Our AltosUI code has taught us a lot about delivering code that runs on Linux, Mac OS X and Windows, so I m busy developing something based on the same underlying Java bits to support MicroPeak. Here s a sample of the graph results so far:

Production plans I ve ordered a couple dozen raw boards from OSH Park, and once those are here, I ll build them and make them available for sale in a couple of weeks. The current plan is to charge $35 for the MicroPeak serial interface board, or sell it bundled with MicroPeak for $75.

27 December 2012

Tanguy Ortolo: Tiling window managers

Floating and tiling window managers In the X Window System, the window manager is that piece of software that places your windows and allows you to move them, resize them, hide them, etc. If your windows have titles on top of them, with buttons to close them or reduce them, it is thanks to the window manager. There are two major types of window managers:

Floating window managers: They are the most usual window managers, that allow you to place an size your windows freely on the screen, in a way where they are independent of each other, possibly overlapping, just as you would be able to place sheets of paper on your desk.
Tiling window managers: They are a more elitist type of window manager, that adjust the size and position of the windows so there is no overlapping and no space lost between windows, thus tiling the screen.

Four computer windows managed in floating mode

Floating window management

Four computer windows managed in tiling mode

Tiling window management

The frontier between these two types is not very tight, because some floating window managers have limited tiling features, and almost all the tiling window manager have floating modes for programs that are not adapted to tiling. For what it is worth, here is a report on my experience with three tiling window managers. Perhaps it may help people that are still hesitating to switch to tiling window management?Using tiling window managers Five years ago I switched to tiling window management, feeling that managing windows was the window manager's job (a quote from larswm). At that time, I started using wmii, then switched to awesome, and now I am trying i3.

wmii: three columns, first one stacked ()

Screenshot of the awesome window manager

awesome: fair layout

i3: three containers, second one stacked, third one tabbed

For people in a hurry, here is a comparative table:

	wmii	awesome	i3
Configuration	Shell script	Lua script	config file
Scriptability	any language, 9P-based	Lua, API-based	any language, socket-based
Layout	simple, column-based	automatic layout-based	flexible, manual tiling-based
Multi-monitor	sucks	OK	OK
Modern stuff		XCB, Xft, notification area, dock windows	XCB, Xft, notification area
Special features	tag-based workspace system	XDG convention to locate the config file real framework for customization	powerful status bar

wmii wmii is a rather minimalist window manager which used to be part of the suckless project, but is now hosted at Google Code and seems to have lost all its documentation during the move (why is it that all the software I have seen hosted at Google Code or worst, at Launchpad, have almost not documentation, and never, ever a single screenshot?). It uses a column-based layout: you place windows in a number of columns, and for each column you can choose either vertical split, stacking or full column mode. This system is very simple to control flexible enough for most situations, but it does not allow for arbitrary tiling. It is fully scriptable in any language, in an interesting way: it exposes all its functions on a 9P virtual filesystem. In fact, wmii itself only implements window management functions, and all the user interaction logic takes place in a distinct script which calls that functions using that 9P filesystem. wmii comes with a default script, written in shell, which makes it a bit hard to extend, and slow if you start calling external programs such as grep, sed and co. You can script it in any language you like however, and there are some already made implementations in Ruby and Python IIRC. A special feature of wmii is that its workspaces are in fact tags, and that you can tag a window so that it will appear in several ones. While this is interesting, in practice I did not find much use for that feature. It is explicitly minimalist, and the developers used to impose themselves a limit on its numbers of lines of codes (not sure if this is still true with the switch to Google Code). While I was using it, it had no notification area, no support for Xft fonts and it sucked with multiple monitors, which is the reason I switched to awesome. awesome awesome is a very flexible window manager that provides advanced automatic layouts. It has become quite popular, and it has a very well documented wiki. It is not officially a tiling window manager but a framework window manager, and to emphasize on that it start in floating mode by default. I think most people use it in tiling mode however. It uses automatic layouts, that place your windows according to rules, for instance the fair layout, which tiles in columns and rows so that each window occupies a similar space. Thanks to that automation, this system is very easy to control and to get used to, and although it certainly does not allow arbitrary tiling, you can cycle between several available layouts which are suitable for most situations (most of them are entirely automatic, but some have parameters you can modify, such as the number and width of columns), and if you miss one, I think you can even code your own. It is fully scriptable using the Lua language. In fact, just as wmii, the user interaction logic is defined in the configuration file, which allows efficient customization. The Lua API is fully documented, and there is a series of useful libraries to extend the basic configuration in any way you like, which is why awesome calls itself a framework window manager . It is not designed to be minimalist, and it implements some modern stuff, such as using XCB rather than Xlib, a notification area, and specific support for dock or utility-type windows such as GIMP's tools. i3 i3 is a window manager inspired by wmii, although they do not have much in common in my opinion. for what I have seen, I think it would be closer the defunct Ion. It uses a layout based on manual splitting: as you open windows you can choose to split an existing window either vertically or horizontally, leading to arbitrarily complex layouts. In addition to that you can tab or stack windows in containers instead of splitting them. This system is very flexible but it requires more user intervention. In its standard mode of operation, i3 is simply configured in a regular way, which allows to customize the user interaction but not to script it in an arbitrary way. It offers an IPC system that can be used for that however, by the mean of a Unix socket, so it can actually be scripted in any language too if needed, although this possibility is probably not as popular and easy as with wmii and awesome. It implements modern features such as XCB, a notification area and a powerful status bar that uses the standard output of a dedicated program that is easy to replace by your own if you need.

23 November 2012

Keith Packard: MicroPeak

MicroPeak tiny peak-recording altimeter now available MicroPeak is a miniature peak-recording altimeter. About the same size and weight as a US dime (with battery), MicroPeak offers fabulous accuracy (20cm or 8in at sea level) and wide range (up to 31km or 101k ).

Uses the Measurement Specialties MS5607 barometric sensor.
Includes built-in battery holder for easily replaceable CR1025 lithium battery
Compact design is only 18mm x 14mm or 0.7 x 0.56 . Weighs 1.9g including the battery.
Low power design lasts for over 40 hours in flight.
Auto-poweroff on landing.
Learn more at the Altus Metrum web site
Buy these at the gag.com web store for Altus Metrum products

The size of the board was predicated with the premise that we needed a battery included to avoid having wiring running between the altimeter and the board, we found some small lithium coin-cell battery holders for the CR1025 battery. These battery holders are rated to hold the battery secure up to 150gs. We d already started playing with the Measurement Specialties MS5607 pressure sensor which offers amazing accuracy while using very little power. Taking full-precision measurements every 96ms consumes about .2mA on average. Once on the ground, we stop taking measurements entirely, dropping the power use to around 1 A. It s also pretty small, measuring only 5mm x 3mm. For a CPU, this little project didn t need much. The 8-bit ATtiny85 comes in a 20qwfn package which is only 4mm x 4mm. When run at full speed (8MHz), it consumes a couple of mA of power. Reduce the clock to a pokey 250kHz and the CPU has enough CPU power to track altitude while consuming less than .2mA on average. To avoid losing the battery, we wanted to avoid having it removed while the board wasn t in use. So, we added a little power switch to the board. The one we found is good to at least 50g. Finally, we wanted to find a nice bright LED to show the state of the device and to blink out the final altitude. The OSRAM LO T67K are bright-orange surface-mount LEDs that run happily on 2mA. We used OSHPark.com to create prototype circuit boards for this project. Because of the small size of the board, each prototype run cost only $2 for three boards. It takes a couple of weeks to get boards, but it s really hard to beat the price. All of the schematic and circuit board artwork are published under the TAPR Open Hardware License and are available via git. All of the source code is published under the GPLv2 and is included in the main AltOS source repository.

Next.

Previous.